A Note on a Simple and Practical Randomized Response Framework for Eliciting Sensitive Dichotomous Quantitative Information

09/17/2019 ∙ by Carel F. W. Peeters, et al. ∙ Victoria University of Wellington VUmc 0

Many issues of interest to social scientists and policymakers are of a sensitive nature in the sense that they are intrusive, stigmatizing or incriminating to the respondent. This results in refusals to cooperate or evasive cooperation in studies using self-reports. In a seminal article Warner proposed to curb this problem by generating an artificial variability in responses to inoculate the individual meaning of answers to sensitive questions. This procedure was further developed and extended, and came to be known as the randomized response (RR) technique. Here, we propose a unified treatment for eliciting sensitive binary as well as quantitative information with RR based on a model where the inoculating elements are provided for by the randomization device. The procedure is simple and we will argue that its implementation in a computer-assisted setting may have superior practical capabilities.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

Many issues that are of interest to social scientists and policymakers are of a sensitive nature. A topic can be said to be ‘sensitive’ when the disclosure of information regarding this topic poses to be threatening for the respondent in the form of being potentially stigmatizing, incriminating or severely intrusive [17, 18]. Due to the potential threats, people have self-representational concerns, leading to refusals to cooperate or evasive cooperation [see, e.g., 13]. The research problems associated with these behaviors tend to inhibit their adequate measurement, and lead one to dismiss the general assumption of sampling theory that data collected on units in the sample are accurate representations of the values associated with the units sampled.

In a seminal article Stanley Warner [30]

proposed to use the element of chance stemming from a randomization device to inoculate individual responses to sensitive inquiries in order to reduce nonsampling bias. Subsequently the population estimates of

, the population proportion indulging in sensitive trait X, will become more accurate. This initial idea grew out to a family of techniques commonly referred to as randomized response (RR), which as a core characteristic uses the insertion of random error by an element of chance to provide a respondent optimal privacy protection. For an in-depth review of developments in RR methods we confine by referring to Deffaa [7], Fox & Tracy [9], Chaudhuri & Mukerjee [5], and Tracy & Mangat [27].

The psychological premises of the RR method should be clearly understood: If the respondent understands that RR objectively guarantees privacy or trusts the method to provide full anonymity in the disclosure of information (whether fully understanding it or not), he or she is relieved from self-representational concerns and will be more inclined to cooperate and will do so in a nonevasive manner. While generally RR is thus thought able to eliminate or relieve both cooperation refusal and evasive cooperation, we view the RR method mainly as a technique for relieving the latter. As one has to actually endeavor on responding to an RR survey before fully grasping its relative merit and assurances with regard to anonymity and data confidentiality, it can be expected that RR will actually have a stronger stance in reducing evasiveness than in reducing response refusal. It is in this light that the developments in the remainder should be read.

Methods for quantitative RR have been given less attention in the past than its binary family members and existing models and proposed randomization efforts often prove impractical. Many new developments in RR seem absorbed by the technical fix, neglecting its purpose as a technique for improving the validity of observational data. This short research note has two aims with regard to communicating our current work on the RR method: (1) to propose a simple unified framework for eliciting both binary and quantitative information with RR (Section 2) and (2) to describe features of the incorporation of the proposed RR procedures in a computer-assisted environment (Section 3). Our framework stresses RR as a survey technique that should have high practicability as we will argue in the concluding discussion.

2. Proposed Framework

An efficient procedure for RR is to let the randomization device provide for the inoculating response. Richard Morton [as cited in 10] and Robert Boruch [2, 3] almost simultaneously proposed an equivalent dichotomous procedure that has this specific design characteristic, now commonly referred to as forced randomized response (FRR). Letting the randomization device provide for the inoculating response combines efficiency, simplicity, and practicability when developing inquiries into large numbers of sensitive behaviors. Here we will give an account of the dichotomous FRR procedure and subsequently extend this model to a polychotomous appreciation of quantitative RR, so as to come to certain unity in the RR framework.

2.1. Binary Forced Randomized Response Model

Consider a sensitive trait X for which the population is dichotomous. A random sample of persons is drawn. The goal is the estimation of the population prevalence of X (). To this purpose each person in the sample is asked the question: “Do you possess trait X?” Each respondent receives a randomization device. In previous studies a pair of dice was used accompanied with the instructions to answer truthfully if the outcome is 5, 6, 7, 8, 9 or 10; to answer with ‘yes’ if the outcome is 2, 3 or 4; or to answer with ‘no’ if the outcome is 11 or 12 [29]. We propose to use a (digital) binary spinner along with the following instructions: Turn the spinner; if it stops on an empty area, respond with ‘yes’ or ‘no’ truthfully; if it stops on an area imprinted with ‘yes’, answer ‘yes’; if it stops on an area imprinted with ‘no’, answer ‘no’.


In this setup we define (using notation by Lang [14])

and

with .

Taking into account the assumptions regarding RR, events can be expressed in probabilities, so that

(1)

As the probability of a ‘yes’ response can be estimated by the sample proportion answering ‘yes’ and as and

are fixed by design, an unbiased moment estimator of the population prevalence of X can be obtained by

(2)

with sampling variance

(3)

and unbiased sampling estimate

(4)

2.2. Discrete Quantitative Forced Randomized Response Model

In inquiries into sensitive behaviors not only questions of prevalence are of interest, but also questions of frequency of occurrence. To evaluate frequency of occurrence, RR models must be used that deal with quantitative data. Here, a discrete quantitative FRR model is developed for which previous work by Greenberg et al. [11], Eriksson [8], Liu & Chow [21], and Stem & Steinhorst [26] is acknowledged. Consider a sensitive trait X, for which the population is supposed to be continuous or discretely quantitative. A random sample of persons is drawn. Instead of focussing on a mean we can device a model which redirects X to be discrete, assuming values with respective unknown true proportions , where and . Each of those values can be assigned a category of numbers, for example in a setup where k = 6: The responses 1, 2, 3, 4, 5 and 6 could correspond with the respective categories ‘0’, ‘1 time’, ‘2 to 3 times’, ‘4 to 5 times’, ‘6 to 10 times’, and ‘more than 10 times’. We are then interested in estimating the proportion in each of discrete quantitative categories, providing for a multi-proportional or polychotomous appreciation of quantitative RR [25].

For such a setup we could again use two dice as randomizing device. But in this case 2 throws with 3 dice are needed, where the selection probabilities of and (as in the previous section) require for the third die to be thrown. The number that turns up in this throw gives the forced response. There are two problems associated with the use of three dice as randomization device. First, two-step procedures introduce an extra source of error due to increased respondent burden. Second, using dice provides a setup where any piece of quantitative information can be misclassified into exactly 6 discrete categories. To move beyond these 6 categories and to decrease the respondent burden one could also supply each respondent with a (digital) discrete quantitative spinner such as given in Section 3 along with the following instructions: Turn the spinner; if it stops on an empty area, respond with ‘1’, ‘2’, …, or ‘k’ truthfully; if it stops on an area imprinted with ‘1’, ‘2’, …, or ‘k’, respond accordingly. The selection probabilities for the truthful answer and the forced responses are and respectively, with and .

In this setup we define

and

with .

If we let denote the probability of an affirmative quantitative response, again with denoting the response pointing to one of our discrete quantitative categories and taking into account the assumptions regarding RR, it can then be shown analogous to (2.1) that

(5)

As the probability of a certain quantitative response within our category parameter space can be estimated by the sample proportion giving an affirmative response to that certain quantitative category , where

denotes the characteristic function

, and as and are fixed by design, each of our

population proportions can be unbiasedly estimated by solving (

2.2) for :

(6)

The property of unbiasedness follows from

Utilizing the quality of a multinomial distribution as a joint structure of binomials, the sampling variance for is

(7)

with unbiased sampling estimate

(8)

2.3. General Form

Let denote the probabilities of the observed responses with categories, and let denote the probabilities of true answers/status with categories. The models as developed here can then be generally captured in the function

(9)

so that the moment estimator is given by [5]

(10)

In the aforementioned

(11)

the matrix of conditional misclassification probabilities where denotes observed answers and denotes true status. Thus, when having a nonsingular and an unbiased estimate of , then can be estimated by (10). As van den Hout and van der Heijden [28] point out, the assumption that is nonsingular does not impose much restriction on the misclassification design matrix as exists when the diagonal of dominates, meaning: , for . This is reasonable as the diagonal elements represent correct classification and these probabilities should be taken relatively high for the design to be efficient.

3. Computer-Assisted Randomized Response

The incorporation of computer-assisted self interviewing (CASI) and RR in computer-assisted randomized response (CARR) interviewing has been proposed and successfully tested by Musch, Bröder & Klauer [23], Lensvelt-Mulders et al. [20] and Lensvelt-Mulders & Boeije [19]. CARR could prove to be very important for future inquiries into socially sensitive issues as other computer-based techniques lose their magic due to penetration of mainstream culture [1, 20], as respondents’ awareness of the negative possibilities of computers regarding his or her privacy issues increases with increasing ‘computer-literacy’, and as the use of internet makes possible the dissemination of data on a large scale which may possibly relieve the efficiency problems inherent in RR designs [23]. Moreover, Boeije & Lensvelt-Mulders [1] and Lensvelt-Mulders & Boeije [19] have shown that respondents prefer computer-assisted RR above paper-and-pencil and face-to-face RR.

The use of CARR makes possible the use of accompanying digital randomizers. Here we propose digital spinners for binary and multi-proportional FRR. Spinners have the advantage of being widely known at least across western populations and it may be argued that, when unfamiliar with spinners, their concept may be easily grasped. Also, two important problems are solved concerning the randomizers when a digital make-up is used. The first of which is randomness. Randomness has proven hard to achieve with physical spinners. For example, Stem and Steinhorst [26] used a physical spinner in a mail questionnaire. This spinner could get bend in the mail, and subsequently affect the randomness of the device. With sufficient programming the digital randomizers give very acceptable and consistent levels of pseudo-randomness. The second problem alleviated with a digital make-up of the randomizing device is the problem of inconclusive outcomes. In the same study by Stem and Steinhorst, an answer could land on a line on their physical spinner, proving the need for additional instructions, which increases the cognitive load for the respondent. Sufficient programming curbs this problem.

12cm!CARR_ENG.pdf

Figure 1. A Computer-Assisted Forced RR Questionnaire

Figure 1111The spinners were developed by Peeters [25] and programmed by Rens Lensvelt of WetenschapWerkt. Web-page questionnaires for CASI and CARR as used in Peeters [25] are properties of C.F.W. Peeters, WetenschapWerkt, Utrecht University, and VU University Amsterdam. gives an impression of our incorporation of a (discrete quantitative) FRR spinner into CASI to make for a CARR questionnaire. The upper left corner contains instructions as in Section 2, the upper right corner contains the randomizer, and the bottom part contains the sensitive question with accompanying answering categories. The discrete quantitative FRR spinner of Figure 1 is constructed with , a of three-fourths and a of one-fourth. These selection probabilities are translated into the randomization device by converting probabilities into degrees. Moriarty and Wiseman [22] give evidence that respondents, due to the involved permutations leading to a certain outcome, misperceive selection probabilities when dice are used as a randomization device. If the misperceiving of probabilities can be incorporated in the randomization device, it is possible to provide for sufficient protection, while pertaining relatively high efficiency. To enhance the misperceiving of selection probabilities the empty and imprinted areas are evenly divided over the spinner, which is divided in 24 sub-areas. This gives every empty area a selection probability of 3/24 () and the analogous selection probability of each of the 6 forced numerical responses is 1/24. These probabilities and number of categories are by no way necessary. The selection probabilities and number of discrete quantitative categories can be chosen to convenience.

Note that in the aforementioned setup of the discrete quantitative FRR spinner a one-step solution is given for the multiple-step misclassification of sensitive data in 6 categories using 3 dice as described in Section 2.2. Also note that by replacing the numbers in the imprinted areas with ‘yes’ and ‘no’, a binary computer-assisted FRR questionnaire is obtained.

4. Discussion

The unified framework for binary and quantitative RR provides several advantages. The design can be adjusted to any efficiency level by selecting certain parameters of the randomization device. Additionally, the spinners are easily incorporated in computerized settings and provide the opportunity of utilizing the psychological advantage of the misperceiving of actual misclassification probabilities [9, 25].

Furthermore, sensitive quantitative information can be misclassified in discrete categories, where the scope of the frequencies denoted with each category can be adapted to the researcher’s need. In previous quantitative designs the desired population means regarding certain sensitive behaviors had to be estimated before any field research efforts, so as to adapt the setup of the randomizer to the expected range of frequencies in the population. After all, the range of responses to the innocuous question has to be similar to the range of possible responses to the sensitive question if one is to unbiasedly estimate frequency of occurrence while still providing sufficient respondent protection [see, e.g., 21, 26]. In the proposed unified approach, the categories can be very easily adopted to expected frequencies while retaining setup and efficiency, as the forced responses have a range that is equal to the range of possible responses on the sensitive question, due to the multi-proportional construction. Respondent protection is thus also statistically guaranteed in the quantitative RR setup.

Notwithstanding the relieve RR provides with regard to self-representational concerns, respondents may still cheat, that is, they may not comply with the RR instructions. Assuming clear instructions and respondent understanding, two constructs are theorized to be responsible for non-compliance: respondent jeopardy and risk of suspicion [12]. The former refers to guilty respondents’ risk of being identified as such when responding affirmatively. The latter refers to innocent respondents’ risk of being identified as guilty when responding affirmatively. These risks may advocate self-protective response behavior: evasive answering irrespective of the outcome of the randomization device. Provisions exist in the form of parameter selection and design symmetry.

The optimization of parameter selection for both the binary and quantitative FRR models in terms of reconciling respondent hazards and efficiency can be captured in the Bayesian framework for conditional RR probabilities proposed by Lanke [15, 16] and Greenberg et al. [12]. Moreover, the proposed FRR design is symmetric [4], meaning that no possible response option, in itself, conveys information on one’s true status. While less efficient than asymmetric designs, there is evidence that symmetric designs spur less cheating due to reduced risk of suspicion relative to their asymmetric counterparts [24].

These provisions may not completely eliminate self-protective response behavior. It may subsequently be possible that the percentage of affirmative responses (in certain response categories) falls below chance level. We would then have an estimator that lies outside the interior of the parameter space, implying that the moment estimator (10) is no longer equivalent to the maximum likelihood estimator. It is here that we may also see the analytical advantage of the unified RR framework in that it allows for a certain unity in the analysis of dichotomous and quantitative RR data. van Den Hout and Van Der Heijden [28] give an elegant framework for analyzing RR data based on a log-linear latent class model analogy. Their framework unifies the chi-square test of independence and numerical maximum likelihood estimation for models of the general form (9). Their log-linear RR approach may be extended to provide prevalence estimates corrected for self-protective response behavior [6].

Adding to the attractiveness of practicality, one may note that it is straightforward to incorporate Likert-scale type questions (e.g., to study sensitive attitudes) into the quantitative FRR setup, as the distinct categories 1-6 could be labeled as a Likert scale . The RR framework given earlier thus provides for simple binary and quantitative RR models whose incorporation in a computerized setting may prove to be more practically feasible in real research settings. The models and CARR questionnaires as described above have been tested in our lab and field-research efforts are currently being undertaken.

Acknowledgements

At the time of writing C.F.W.P. was affiliated with the Department of Methodology & Statistics at Utrecht University and K.L. was affiliated with the Department of Governance Studies, VU University Amsterdam. This version is a postprint of: Peeters, C.F.W., Lensvelt-Mulders, G.J.L.M., & Lasthuizen, K. (2010). A Note on a Simple and Practical Randomized Response Framework for Eliciting Sensitive Dichotomous and Quantitative Information. Sociological Methods & Research, 39: 283–296.

References

  • [1] Boeije, H.R. & Lensvelt-Mulders, G.J.L.M. (2002). Honest by Chance: A Qualitative Interview Study to Clarify Respondents’ (Non)-Compliance with Computer Assisted Randomized Response. Bulletin de Methodologie Sociologique, 75:24–39.
  • [2] Boruch, R.F. (1971). Assuring Confidentiality of Responses in Social Research: A Note on Strategies. American Sociologist, 6:308–311.
  • [3] Boruch, R.F. (1972). Relations Among Statistical Methods for Assuring Confidentiality of Social Research Data. Social Science Research, 1:403–414.
  • [4] Bourke, P.D. & Dalenius, T. (1976). Some New Ideas in the Realm of Randomized Inquiries. International Statistical Review, 44:219–221.
  • [5] Chaudhuri, A. & Mukerjee, R. (1988). Randomized Response: Theory and Techniques. New York: Marcel Dekker.
  • [6] Cruyff, M.J.L.F., van den Hout, A., van der Heijden, P.G.M., & Böckenholt, U. (2007). Log-Linear Randomized-Response Models Taking Self-Protective Response Behavior Into Account. Sociological Methods & Research, 36:266–282.
  • [7] Deffaa, W. (1982). Anonymisierte Befragungen mit Zufallsverschlüsselten Antworten. Die Randomized Response Technik (RRT): Methodische Grundlagen, Modelle und Anwendungen. (Anonymous Questioning with Misclassified Responses. The Randomized Response Technique (RRT): Methodological Assumptions, Models and Uses). Frankfurt am Main: Verlag Peter Lang.
  • [8] Eriksson, S.A. (1973). A New Model for Randomized Response. International Statistical Review, 41:101–113.
  • [9] Fox, J.A. & Tracy, P.E. (1986). Randomized Response. A Method for Sensitive Surveys. Beverly Hills: Sage Publications.
  • [10] Greenberg, B.G., Abul-Ela, A.L.A., Simmons, W.R., & Horvitz, D.G. (1969). The Unrelated Question Randomized Response Model: Theoretical Framework. Journal of the American Statistical Association, 64:520–539.
  • [11] Greenberg, B.G., Kuebler, R.R., Abernathy, J.R., & Horvitz, D.G. (1971). Application of the Randomized Response Technique in Obtaining Quantitative Data. Journal of the American Statistical Association, 66:243–250.
  • [12] Greenberg, B.G., Kuebler, R.R., Abernathy, J.R., & Horvitz, D.G. (1977). Respondent Hazards in the Unrelated Question Randomized Response Model. Journal of Statistical Planning and Inference, 1:53–60.
  • [13] Junger, M. (1990). Discrepancies Between Police and Self-Report Data for Dutch Racial Minorities. The British Journal of Criminology, 29:273–283.
  • [14] Lang, S. (2004). Randomized Response: Befragungstechniken zur Vermeidung von Verzerrungen bei sensitiven Fragen. (Randomized Response: Questioning Techniques for Curbing Bias when asking Sensitive Questions). Habilitations-Probevorlesung Universität München, Unstitut für Statistik.
  • [15] Lanke, J. (1975). On the Choice of the Unrelated Question in Simmons’ Version of Randomized Response. Journal of the American Statistical Association, 70:80–83.
  • [16] Lanke, J. (1976). On the Degree of Protection in Randomized Interviews. International Statistical Review, 44:197–203.
  • [17] Lee, R.M. (1993). Doing Research on Sensitive Topics. London: Sage Publications.
  • [18] Lee, R.M. & Renzetti, C.M. (1990). The Problems of Researching Sensitive Topics: An Overview and Introduction. American Behavioral Scientist, 33:510–528.
  • [19] Lensvelt-Mulders, G.J.L.M. & Boeije, H.R. (2007). Evaluating Compliance with a Computer Assisted Randomised Response Technique: A Qualitative Study Into the Origins of Lying and Cheating. Computers in Human Behavior, 23:591–608.
  • [20] Lensvelt-Mulders, G.J.L.M., van der Heijden, P.G.M., Laudy, O., & van Gils, G. (2006). A validation of a Computer-Assisted Randomized-Response Survey to Estimate the Prevalence of Fraud in Social Security. Journal of the Royal Statistical Society A, 169:305–318.
  • [21] Liu, P.T. & Chow, L.P. (1976). A New Discrete Quantitative Randomized Response Model. Journal of the American Statistical Association, 71:72–73.
  • [22] Moriarty, M. & Wiseman, F. (1976). On the Choice of a Randomization Technique With the Randomized Response Model. Pp. 624–626 in Proceedings of the American Statistical Association, Social Statistics Section. Washington, DC: American Statistical Association.
  • [23] Musch, J., Bröder, A., & Klauer, K.C. (2001). Improving Survey Research on the World-Wide Web using the Randomized Response Technique. Pp. 179–192 in Dimensions of Internet Science, edited by U.D. Reips and M. Bosnjak. Lengerich, Germany: Pabst Science Publishers.
  • [24] Ostapczuk, M., Moshagen, M., Zhao, Z., & Musch, J. (2009). Assessing Sensitive Attributes Using the Randomized Response Technique: Evidence for the Importance of Response Symmetry. Journal of Educational and Behavioral Statistics, 34:267–287.
  • [25] Peeters, C.F.W. (2005).

    Measuring Politically Sensitive Behavior. Using Probability Theory in the Form of Randomized Response to Estimate Prevalence and Incidence of Misbehavior in the Public Sphere: A Test on Integrity Violations

    . Amsterdam, the Netherlands: Dynamics of Governance, Vrije Universiteit.
  • [26] Stem, D.E. & Steinhorst, R.K. (1984). Telephone Interview and Mail Questionnaire Applications of the Randomized Response Model. Journal of the American Statistical Association, 79:555–564.
  • [27] Tracy, D.S. & Mangat, N.S. (1996). Some Developments in Randomized Response Sampling During the Last Decade - A Followup of Review by Chaudhuri and Mukerjee. Journal of Applied Statistical Science, 4:533–544.
  • [28] van den Hout, A. & van der Heijden, P.G.M. (2004). The Analysis of Multivariate Misclassified Data with Special Attention to Randomized Response Data. Sociological Methods & Research, 32:384–410.
  • [29] van der Heijden, P.G.M., van Gils, G., Bouts, J., & Hox, J.J. (2000). A Comparison of Randomized Response, Computer-Assisted Self-Interview, and Face-to-Face Direct Questioning: Eliciting Sensitive Information in the Context of Welfare and Unemployment Benefit. Sociological Methods & Research, 28:505–537.
  • [30] Warner, S.L. (1965). Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias. Journal of the American Statistical Association, 60:63–69.