We develop a novel nonparametric identification result for the following class of models,
is a vector of additively separable index functions while, and are all vector-valued functions of dimension . The arguments and represent the values of two sets of regressors, and , while corresponds to values of a set of control variables, . We take as high-level assumption that we know (have observed from data) the function , for in the support of , from which we then wish to identify the unknown functions and , while we treat the function as being known. We refer to this class of models as index models since and are restricted to enter the model through and , respectively. We make three major contributions relative to the existing literature:
First, we do not impose any large support conditions on any of the regressors in our model. Most existing results on identification within this class of models require availability of a set of ”special” continuously distributed regressors; identification is then achieved by sending each of these special regressors off to the boundary of their support. Estimators based on such ”thin set identification” argument were analyzed byKhan2010 who showed that they tend to be irregularly behaved with slow convergence rates. In contrast, we achieve identification as long as the random index exhibits sufficient, but potentially bounded, variation. We expect this to translate into better behaved estimators.
Second, we impose weak conditions on the functions of interest and distributions of the random variables. We do not require continuity or differentiability of the functions entering the model in order to show identification while most existing results as a minimum require these to be differentiable. Similarly, we only require to have continuous support while can both be discrete, continuous or a mix of the two as long as their supports satisfy certain conditions. Thus, our results cover models with thresholds and kinks in , and , which existing results cannot handle. In the case of discrete choice models such features may occur if the decision maker optimizes subject to constraints; see, e.g., Cantillo2006a. These models have traditionally been formulated in a parametric fashion; our theory demonstrates how these can be identified without parametric constraints. There is a growing literature on nonparametric estimation with unknown thresholds and kinks which we conjecture can be employed in our setting in order to translate our identification result into actual estimators; see, e.g., Chiou2018.
Third, we show how the presence of the controls can help to achieve identification in a nontrivial way: We first show local identification at each value of the control . Suitable variation in then allows us to piece the locally identified components together across different values of to achieve global identification. In comparison, most other papers that allow for control variables show identification at a fixed arbitrary value of in which case variation in is unnecessary for identification.
Our proof strategy relies on arguments from general topology that, to our knowledge, are completely new to the literature on nonparametric identification. These should be of general interest since they can be used for identification in other settings. The two key elements of our approach is the notions of relative identification and connected sets. Below, we state our formal definition of the former:
A function is said to be relatively identified on a given set if identification of at some point implies that is also identified at all other .
Next, recall the topological notion of connnectedness: A connected set cannot be contained in the union of two non-empty disjoint open sets while having non-empty intersection with both. In particular, it is not possible to split a connected open set into disjoint open subsets.
Our identification strategy then proceeds in three steps where we here initially suppress the presence of for simplicity: First, we decompose the support of into suitable subsets and achieve relative identification on each of these. This is done via two features of our model: For a given , we are able to identify the relative variation in , with , through the observed variation in w.r.t. through the known function . By injectivity of , we are then able to identify the relative value of which in turn yields the relative value of on suitably chosen subsets of the support of . Second, we achieve global identification on the union of these subsets by using the second main ingredient of our proof strategy, connectedness: We will require the support of to be connected which is used to extend relative local identification to global identification. Finally, reintroducing , we again rely on the supports of to be suitably connected across different values of in the support of to enlarge the identification region further.
Like us, Berry2018 and Evdokimov2010, among others, rely on connectness to achieve global identification but in these papers the restriction is imposed directly on the support of the covariates thereby implicity restricting the covariates to be continuous. In contrast, we impose connectedness on the image of and so allow for both and to contain discrete components.
Two leading examples that fall within our general framework are nonparametric additive versions of multiple discrete choice and competing risk models as shown in the next section. There is a large literature on identification and estimation of semiparametric multinomial choice models (see, e.g., Manski1975; Lewbel2000). In contrast, the literature on nonparametric identification is quite thin with few results having been developed since the seminal work of Matzkin1993. In terms of modelling, Matzkin1993
is probably the most closely related to our setting, but the assumptions made and identification strategy pursued in this paper are very different from ours. Our and her set of assumptions are not clearly ranked with some of our assumptions being stronger while others weaker compared to hers. One key feature of her proof strategy is the introduction of assumptions that ensure the multinomial model can be converted into a binary choice problem followed by a thin-set identification argument. More recently,Allen2019 provide conditions under which one can identify how regressors alter the desirability of alternatives using only average demands. Their conditions are weaker than ours but on the other hand they are only able to identify certain features of the model, not the underlying data-generating structure.
There is also a nascent literature on nonparametric identification of so-called BLP models (Berry1995) as used in industrial organization; see, for example, Berry2018 and Chiappori2018. The setting of the BLP model is somewhat different, though, since there the choice probabilities are treated as observed variables which depend on unobserved product characteristics that have to be controlled for. This leads to a different identification problem compared to ours.
Finally, there is also a literature on identification in competing risk models. The two most closely related papers in terms of modelling are Heckman1989 and Lee2013. Heckman1989 achieves identification by assuming the index (in our notation ) has support on and then achieves identification of a given component of the index by letting the other components go to zero, and so their result falls in the thin-set identification category. Abbring2003a weaken this assumption substantially for the class of mixed proportional hazard models, a subclass of competing risk models. Lee2013 provide a high-level assumption for identification of the general model involving a rank condition of an integral operator. Primitive conditions for this to hold are not known. Honore2006 derive bounds for the functions of interest when only discrete covariates are available. We complement these studies by showing identification in the general competing risk model under primitive conditions that allow for the presence of discrete co-variates, but at the same time impose more structure on the index, c.f. eq. (1.2).
In the next section, we give two motivating examples in form of a random utility model and a competing risk model that both fall within the setting of eq. (1.1). We present our general framework in Section 3 and the assumptions we will work under, and provide our identification results in Section 4. Section 5 applies our general result to the two examples and Section 6 concludes.
2 Two Motivating Examples
The model (1.1) comprises a range of models that are met in economics. We here present two classes of models that fall within our framework. We will return to these two classes of models in Section 5 where we apply our general identification result to each of them.
2.1 Discrete choice models
We here first demonstrate that the class of additive random utility models (ARUM) can be mapped into (1.1). Using existing results in the literature, this in turn implies that our results also apply to a broad class of rational inattention discrete choice models (Fosgerau2016r) and an even wider class of perturbed utility models.
2.1.1 Additive random utility
Consider an agent choosing between alternatives, each carrying an associated indirect utility of the form
where is a set of observed covariates while is unobserved. This model was initially proposed by McFadden1974a and has since become one of the workhorses in applied microeconomics; see e.g. Ben-Akiva1985 and Maddala1983. As is standard in the literature, we impose the following normalization on the ”outside”option : .
Some of the regressors may potentially be dependent on . To handle this situation, we assume the availability of a set of control variables so that are independent of conditional on . In addition to , the researcher also observes the utility maximizing choice, . Thus, the conditional choice probabilities (CCP’s),
are identified in the population. We collect these in the vector-valued function where we leave out the CCP of the outside option. It now follows from standard results in the literature that can be written on the form (1.1) with being the gradient of the so-called surplus function; see Section 5 for further details.
Our identification result requires the researcher to group the observed covariates into two sets: The first set, denoted , contains the ”special” regressors that enter the index through a known function as specified by the researcher, c.f. eq. (1.2). The second set, denoted , then enters through which is left unspecified. The choices of and are application specific and should be guided by two considerations: First, need to exhibit sufficient continuous variation on since this is a key requirement for our identification result to go through. Second, since affects the utility of the th alternative positively by definition, it should be specified accordingly.
As an example of this joint modelling and identification strategy, let us consider the problem of estimating willingness-to-pay for different goods, a common problem in various applied fields of economics (e.g., Fosgerau2006d; Bontemps2016). In this setting, choosing to be , where is the price of alternative , , transforms a positive price vector into a vector that can in principle attain values in all of . With this choice, captures the log willingness to pay for good , where contains characteristics of the agent and other characteristics of the different alternatives. Prices generally exhibit continuous variation and so satisfy the first of the two aforementioned requirements. This example assumes the availability of alternative specific regressors, . However, our identification result may still be applied if this is not true. In this case, the researcher needs to construct alternative-specific regressors from a set of underlying covariates .
Our assumption of being known has antecedents in the literature on identification in discrete choice models. For example, in the context of binary choice (), Lewbel2000 also assumes the presence of a ”special” regressor, in our notation , that enters the utility of alternative 1 in a known fashion. But this paper furthermore restricts to be linear, and, importantly, identification of is achieved through variation of on the boundary of its support. Our identification result does not rely on any such argument.
Our framework also includes so-called rational inattention discrete choice model. Fosgerau2016r show that any ARUM satisfying the conditions above is observationally equivalent to a rational inattention discrete choice model in which the prior is held constant. This generalizes the finding of Matejka2015
who show that the multinomial logit model has a foundation as a rational inattention model. Thus, our identification result extends without effort to a broad class of rational inattention models.
2.1.2 Perturbed utility
The class of perturbed utility models (McFadden2012; Fudenberg2015; Allen2019) is another generalization of the class of ARUM. As shown by Hofbauer2002, the CCP’s of an ARUM can be represented as the solution to a maximization problem where an agent chooses the vector of CCP’s to maximize a function that consists of a linear term and a concave term. Here we present an extended version that includes controls affecting the concave term, i.e.
where is a vector of utility indices, is the unit simplex and is a concave function for each . The perturbed utility model includes ARUM as a special case, while allowing an individual to have strict preference for randomization rather than to choose a vertex of the probability simplex. As noted by Allen2019, observing only realizations of lotteries across choice options is sufficient for identification which requires only the vector of CCP’s, . We show in Section 5 that the implied CCP’s satisfy (1.1).
2.2 Accelerated failure time models for competing risks
Consider a competing risk model as in Heckman1989 with competing causes of failure. A latent failure time is associated with each cause . The econometrician observes the duration until the first failure, , and the associated cause of failure, , together with a set of observed covariates . Assume that the th failure time satisfies
for some function , . The model may then be termed a multivariate generalized accelerated failure time model (Kalbfleisch2002; Fosgerau2013y). The econometrician has knowledge of
for , where is used to control for potential dependence between and . We collect the unobservables in and again require them to be conditionally independent of in which case, as shown in Section 5, defined above again satisfies eq. (1.1).
Typical applications of the above model are in the modelling of (un)employment spells where an exit from the unemployment register can be the result of finding a full or a part-time job in different sectors or another change of status. Thus, in this setting, indices the different exits (types of non-unemployment), and contain both variables characterizing the types of employment (such as salary in a given type/sector of employment) and individual-specific controls (such as age and marital status). Similar to discrete choice models, we would then need to construct to capture risk-specific characteristics with continuous variation and then include all other co-variates in . Most empirical applications assume a parametric structure for the index, e.g. . In this setting, requiring to be known effectively assumes fixing . At the same time, we impose very weak restrictions on the distributional features of the regressors and how they enter the index .
3 General framework
We now return to the general model given in eqs. (1.1-1.2) where is assumed to be a known function while and are unknown functions. In the following, let denote the interior of a given set and let denote the support of a given random variable . We then take as given and known to us for all where denote the random variables that we have observed, c.f. the examples in the previous section.
The covariates contained in play a special role in our approach in that we need sufficient continuous variation in these to achieve identification. First note that . Thus, sufficient continuous variation of , which is known to us, permit us to identify the relative variation of w.r.t. . Formally, for any given pair , define
We will then throughout implicitly require that some of the open sets , , are non-empty and then achieve identification at the values of for which this is true. A sufficient condition for a given to be non-empty is that the distribution of is continuous and that maps open sets into open sets; however, this is not required and may contain discrete components as long as they fall within the support of the continuous component. However, our identification result still applies if any values of a discrete component fall outside the continuous support but excludes these values. This also rules out that some components of are included in since in this case . At the same time, however, can depend on ; we just need sufficient variation in conditional on . Moreover, no continuity restrictions are imposed on the distribution of which may be completely discrete. Finally, we would like to stress that we do not impose any large-support restrictions on , which is in contrast to most existing results in the literature, as discussed in the Introduction. If, for example, , for all , then our result demonstrates that is identified on all of ; but it is not necessary, identification on all of can be achieved without such full support condition.
denote the support of and , respectively, and
the supports of the same random variables but now only conditioning on . Finally, for some set chosen according to certain assumptions stated below, let
be the supports of and conditional on , respectively. We will then show identification of and for , and . Specifically, will be constructed according to certain properties of the underlying covariates and the functions of interest. Observe the dependence of and on the set . To achieve “maximal” identification, we would ideally like to choose . However, we potentially have to restrict . First, we require to satisfy the following condition for all :
For any , is injective on as defined in (3.8).
By asking for to be injective, we can identify the relative variation in through the observed variation in . In a given application, Assumption 3.1 may not hold for all in which case we need to remove such values from . In the worst case scenario, this leaves us with being empty and our identification result becomes void. At the other extreme, and we may achieve identification on the whole support.
Due to the structure of , it follows from the definition of that and thereby also and are open sets. We add to this by also requiring to be connected for all . An open set is connected if implies that whenever and are nonempty open sets. Thus an open connected set cannot be separated into two non-empty disjoint open sets. We then impose:
is connected for all .
Assumption 3.2 allows us to go from local identification at a given point to relative identification on all of , via the image of . The assumption imposes restrictions on the support of the random variable instead of themselves. This is done in order to impose minimal restrictions on the distribution of and the smoothness of . Recall that is assumed to contain a continuous component. Thus, Assumption 3.2 includes, for example, the case of being unbounded and discrete, or to be continuous while is discontinuous everywhere. Assumption 3.2 is not verifiable from data but the same holds for smoothness conditions that are regularly imposed in existing identification results. If we are willing to entertain certain smoothness conditions, such as the inverse of being continuous with respect to , then the assumption is implied by connectedness of , this latter property being verifiable. Similarly, if we restrict and to both be continuous, it will be implied by connectedness of .
Once we have achieved relative identification on each , , global identification is then reached through the following assumption:
If , then .
This is used to paste together the relatively identified sets across . Again, this assumption does not require and/or to be continuous, only that the sets , overlap. Finally, the following normalization on the function gives us identification on :
There exists known and so that .
Such a normalization is needed to identify the level of since, for any given pair of , we have where and for some given value of .
4 Main result
As explained earlier, we shall make use of the notion of relative identification in our proof of identification. As a first step, we show relative identification on any two overlapping images of ; this is achieved through injectivity of which allows us to map the overlapping images into overlapping images of .
Suppose that Assumption 3.1 holds, and that is identified at for some . Then the set is identified and is identified on .
By definition, if and only if there exists and so that , and . Using that is injective by Assumption 3.1, the last equality is equivalent to , which we recognize as
where is known to us. Thus, is identified as the set of solutions to (4.10) as we vary . Next, for any given , let and be the corresponding values for which (4.10) holds. Since these are known, the value is also known to us. This in turn implies that is identified.
We then use this lemma in conjunction with the connectedness of to show relative identification on each of the sets :
Let be given and suppose we know the value of . Let be the set on which is identified and let be the corresponding values of . By assumption and so the identified set is non-empty. This in turn implies that is non-empty and open. Now, seeking a contradiction, suppose that . Then define which is also open and non-empty. Since , which is connected according to Assumption 3.2, there must exist and so that . Lemma 4.1 then implies that and is also identified which is a contradiction.
Let be the identified set. By Lemma 4.2, for some . By Assumption 4, and so the set is non-empty. Seeking a contradiction, suppose that . By definition and so by Assumption 3.3. This implies that there exists and so that which in turn implies that there exists for which is identified. But then Lemma 4.2 implies that is identified on all of which is a contradiction.
Once we have identified we can also identify :
Let and be given. By definition of , there exists some pair such that . Since and thereby also is identified, the pair is known. But then we also know and so is uniquely identified.
This section applies the general result to the two main examples of Section 2, the ARUM and the competing risk model, and compare our identification results for these two models with existing ones found in the literature. In both examples, we impose the following conditional independence restriction on the error term:
(i) is conditionally independent of , for all for some ; (ii) has a conditional density with full support for all .
We demonstrate in the next two subsections that part (i) implies , as defined in eq. (2.3) and 2.6, respectively, can be written on the form (1.1)-(1.2) for all , while part (ii) ensures that the model specific is injective w.r.t for all .
Define the surplus function
for any given , where the second equality uses eq. (2.1.1) and Assumption 5.5(i). The Williams-Daly-Zacchary Theorem (McFadden1981) then implies that the CCP’s, as defined in (2.3), can be written on the form (1.1)-(1.2) with defined as the gradient of the surplus function,
First, Assumption 3.1, injectivity of for each , is implied by Assumption 5.5(ii), c.f. Hofbauer2002. However, Assumption 5.5(ii) is not necessary for injectivity to hold. A simply example is the binomial model, where the probability for alternative 0 is the cumulative distribution of . If the distribution includes point masses, then ties can occur, but this does not destroy injectivity. This is true for any tie-breaking rule. More generally, if the subdifferential of the surplus function is strictly cyclically monotone (Rockafellar1970), which does not require the existence of a density, then the utility maximizing choice probabilities under any tie-breaking rule are injective (Sorensen2019).
impose restrictions on the joint variation of. For Assumption 3.2 to hold, we need to identify regressors, , that exhibit enough joint continuous variation so their joint support, conditional on has non-empty interior on . One instance where this can be achieved is if we have observed alternative specific characteristics. In case of demand modellling, one such choice would be a (transformation) of the (relative) prices of the different alternative while contains all remaining regressors, possibly including other alternative specific covariates. In this case, to control for potential endogeneity of prices, we could then include cost shifters in . Prices tend to exhibit continuous variation and Assumptions 3.2 would be likely to hold. Assumption 3.3 requires other observed product characteristics and the agent’s observed characteristics to exhibit sufficient variation conditional on the controls in so that these have overlapping support across different values of .
As already mentioned in the introduction, there are few fully nonparametric identification results for ARUM. To our knowledge, the only results comparable to ours are found in Matzkin1993. Her results also require the presence of alternative specific regressors but impose stronger conditions on these and other covariates. Moreover, her set-up does not include any control variables. On the other hand, she does not necessarily require that is additive, which we assume throughout. Theorem 1 of Matzkin1993 does allow for dependence between and but in this case, she requires the observed component of the utilities to be identical across alternatives and strictly increasing in one of the arguments. In our notation, this requires , to all be identical. We do not impose any such constraints. Her Theorem 2 requires full independence between and but, on the other hand, impose fewer restrictions on compared to us. But in both cases, she identifies by letting different components of diverge to , which is an example of ”thin set identification” discussed earlier.
5.2 Perturbed discrete choice
We here demonstrate that the CCP’s for the pertubed discrete choice model again can be expressed on the form (1.1)-(1.2) with defined in (2.4) being injective. This is done under the following restrictions: First, in order to rule out zero demands, the norm of the gradient has to approach infinity as approaches the boundary of the unit simplex. Second, is differentiable111Note we do not require a Hessian.. Third, we normalize the outside option so that . Under these three restrictions, for each value of the control , the demand solves the first-order condition for an interior solution,
where is a scalar constant and is a vector consisting of ones. To show that is injective, consider this equation at and and assume that . Define a matrix such that for all . Pre-multiply this matrix onto the first-order condition to obtain that
which implies that as required.
5.3 Competing Risk
where as before while is now defined as the expected log failure time,
where the second equality uses eq. (2.5) and Assumption 5.5(i). Williams-Daly-Zacchary Theorem (McFadden1981) then implies that , now defined by (2.6), can be written on the form (1.1)-(1.2). Injectivity of , as given in eq. (5.11), is obtained by recycling the arguments of the previous subsection except that no normalization of one of the causes of failure is required since the level is included.
Given that the competing risk model and the ARUM share a similar structure, the discussion of the remaining assumptions carry over to the current setting with obvious modifications.
Compared to existing results (Heckman1989; Lee2013) we impose stronger conditions on the index since we require it to be additive and with known. On the other hand, Heckman1989 require to go to zero as diverges, and so relies on a ”thin set identification” argument, while Lee2013 rely on a high-level functional rank-condition. It is unclear which primitive conditions suffice for this rank condition to hold. Finally, Honore2006 restrict themselves to the case of purely discrete regressors and are only able to derive bounds for objects of interest. We achieve point identification as long as there is some continuous variation in while can be completely discrete
We have established an identification result for a wide class of index models based on general topological arguments. Three key features of our argument is that smoothness of the model is not required; no large support condition is imposed on the regressors; and control variables may contribute to achieving identification. We leave the development of nonparametric estimators of the identified components for future research.