A Bias-Corrected Estimator for the Crosswise Model with Inattentive Respondents

Yuki Atsusaka; Randolph T. Stevenson

doi:10.1017/pan.2021.43

A Bias-Corrected Estimator for the Crosswise Model with Inattentive Respondents

Published online by Cambridge University Press: 17 December 2021

Yuki Atsusaka

and

Randolph T. Stevenson

Show author details

Yuki Atsusaka*: Affiliation:
Ph.D. Candidate, Department of Political Science, Rice University, 6100 Main Street, Houston, TX 77005, USA. E-mail: [email protected], URL: https://atsusaka.org
Randolph T. Stevenson: Affiliation:
Radoslav Tsanoff Professor of Public Affair, Department of Political Science, Rice University, 6100 Main St, Houston, TX 77005, USA. E-mail: [email protected], URL: https://www.randystevenson.com
*: Corresponding author Yuki Atsusaka

Article contents

Abstract
Introduction
Promise and Pitfalls of the Crosswise Model
The Proposed Methodology
Simulation Studies
Extensions of the Bias-Corrected Estimator
Concluding Remarks
Data Availability Statement
Supplementary Material
Footnotes
References

Rights & Permissions

Abstract

The crosswise model is an increasingly popular survey technique to elicit candid answers from respondents on sensitive questions. Recent studies, however, point out that in the presence of inattentive respondents, the conventional estimator of the prevalence of a sensitive attribute is biased toward 0.5. To remedy this problem, we propose a simple design-based bias correction using an anchor question that has a sensitive item with known prevalence. We demonstrate that we can easily estimate and correct for the bias arising from inattentive respondents without measuring individual-level attentiveness. We also offer several useful extensions of our estimator, including a sensitivity analysis for the conventional estimator, a strategy for weighting, a framework for multivariate regressions in which a latent sensitive trait is used as an outcome or a predictor, and tools for power analysis and parameter selection. Our method can be easily implemented through our open-source software cWise.

Keywords

crosswise model sensitive questions inattentive survey respondents indirect questioning techniques privacy protection

Type: Article
Information: Political Analysis , Volume 31 , Issue 1 , January 2023 , pp. 134 - 148

DOI: https://doi.org/10.1017/pan.2021.43 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: © The Author(s) 2021. Published by Cambridge University Press on behalf of the Society for Political Methodology

1 Introduction

Political scientists often use surveys to estimate and analyze the prevalence of sensitive attitudes and behavior.Footnote ¹ To mitigate sensitivity bias in self-reported data (Blair, Coppock, and Moor Reference Blair, Coppock and Moor2020), such as bias arising from social desirability, self-image protection, fear of disclosing truth, and perceived intrusiveness, various survey techniques have been developed including randomized response techniques, list experiments, and endorsement experiments.Footnote ² The crosswise model is an increasingly popular design among these techniques.Footnote ³ A crosswise question shows respondents two statements: one about a sensitive attitude or behavior of interest (e.g., “I am willing to bribe a police officer”) and one about some piece of private, nonsensitive information whose population prevalence is known (e.g., “My mother was born in January”). The respondents are then asked a question, the answer to which depends jointly on the truth status of both statements and so fully protects respondent privacy (Yu, Tian, and Tang Reference Yu, Tian and Tang2008). The key idea is that even though researchers only observe respondents’ answers to the joint condition, they can estimate the population prevalence of the sensitive attribute using the known probability distribution of the nonsensitive statement.Footnote ⁴

Despite its promise and several advantages over other indirect questioning techniques (Meisters, Hoffmann, and Musch Reference Meisters, Hoffmann and Musch2020b), recent studies suggest that the crosswise model suffers from two interrelated problems, casting doubt on the validity of the design. First, its relatively complex format leads to a significant share of inattentive survey respondents who give answers that are essentially random (Enzmann Reference Enzmann, Eifler and Faulbaum2017; Heck, Hoffmann, and Moshagen Reference Heck, Hoffmann and Moshagen2018; John et al. Reference John, Loewenstein, Acquisti and Vosgerau2018; Schnapp Reference Schnapp2019; Walzenbach and Hinz Reference Walzenbach and Hinz2019).Footnote ⁵ Second, this tendency usually results in overestimates of the prevalence of sensitive attributes (Höglinger and Diekmann Reference Höglinger and Diekmann2017; Höglinger and Jann Reference Höglinger and Jann2018; Meisters, Hoffmann, and Musch Reference Meisters, Hoffmann and Musch2020a; Nasirian et al. Reference Nasirian2018). While several potential solutions to this problem have been discussed in the extant literature (Enzmann Reference Enzmann, Eifler and Faulbaum2017; Meisters et al. Reference Meisters, Hoffmann and Musch2020a; Schnapp Reference Schnapp2019), they rely on rather strong assumptions and external information from either attention checks or a different unprotected—but still sensitive—question answered by the same respondents, leading to unsatisfactory solutions.Footnote ⁶ In this article, we propose an alternative solution that builds on insights about “zero-prevalence items” (Höglinger and Diekmann Reference Höglinger and Diekmann2017) to correct for the bias arising from inattentive respondents.

More generally, we provide the first detailed description and statistical evaluation of methods for measuring and mitigating the bias caused by inattentive respondents in the crosswise model. This includes an evaluation of the performance of our new estimator and a brief assessment of previous methods. Consequently, we not only offer our method as a solution for estimating prevalence rates without bias, but also explain how its assumptions can be easily evaluated and made to hold by design. It also allows us to develop many extensions that enhance its practical usefulness to researchers, including (1) a sensitivity analysis to simulate the amount of bias caused by inattentive respondents even when our correction is not available; (2) a weighting strategy that allows our estimator to be used with general sampling schemes; (3) a framework for multivariate regressions in which a latent sensitive trait is used as an outcome or a predictor; and (4) simulation-based tools for power analysis and parameter selection. We developed easy-to-use software that allows users to analyze data from the crosswise model with our bias correction and its extensions.

2 Promise and Pitfalls of the Crosswise Model

2.1 The Crosswise Model

The crosswise model was developed by Yu et al. (Reference Yu, Tian and Tang2008) to overcome several limitations with randomized response techniques (Blair, Imai, and Zhou Reference Blair, Imai and Zhou2015; Warner Reference Warner1965). In political science and related disciplines, this design has been used to study corruption (Corbacho et al. Reference Corbacho, Gingerich, Oliveros and Ruiz-Vega2016; Oliveros and Gingerich Reference Oliveros and Gingerich2020), “shy voters” and strategic voting (Waubert de Puiseau, Hoffmann, and Musch Reference Waubert de Puiseau, Hoffmann and Musch2017), self-reported turnout (Kuhn and Vivyan Reference Kuhn and Vivyan2018), prejudice against female leaders (Hoffmann and Musch Reference Hoffmann and Musch2019), xenophobia (Hoffmann and Musch Reference Hoffmann and Musch2016), and anti-refugee attitudes (Hoffmann, Meisters, and Musch Reference Hoffmann, Meisters and Musch2020). The crosswise model asks the respondent to read two statements whose veracity is known only to her. For example, Corbacho et al. (Reference Corbacho, Gingerich, Oliveros and Ruiz-Vega2016) study corruption in Costa Rica with the following question:

Crosswise Question: How many of the following statements are true?

Statement A: In order to avoid paying a traffic ticket, I would be willing to pay a bribe to a police officer.

Statement B: My mother was born in October, November, or December.

• $\underline {\text {Both}}$ statements are true, or $\underline {\text {neither}}$ statement is true.
• $\underline {\text {Only one}}$ statement is true.

Statement A is the sensitive statement that researchers would have asked directly if there had been no worry about sensitivity bias. The quantity of interest is the population proportion of individuals who agree with Statement A.Footnote ⁷ In contrast, Statement B is a nonsensitive statement whose population prevalence is ex ante known to researchers.Footnote ⁸ The crosswise model then asks respondents whether “both statements are true, or neither statement is true” or “only one statement is true.” Respondents may also have the option of choosing “Refuse to Answer” or “Don’t Know.”

Importantly, the respondent’s answer does not allow interviewers (or anyone) to know whether he agrees or disagrees with the sensitive statement, which fully protects his privacy. Nevertheless, the crosswise model allows us to estimate the proportion of respondents for which the sensitive statement is true via a simple calculation. Suppose that the observed proportion of respondents choosing “both or neither is true” is 0.65 while the known population proportion for Statement B is 0.25. If the sensitive and nonsensitive statements are statistically independent, it follows that: $\widehat {\mathbb {P}}(\textsf {TRUE-TRUE} \cup \textsf {FALSE-FALSE}) = 0.65 \Rightarrow \widehat {\mathbb {P}}(\textsf {A=TRUE})\times \mathbb {P}(\textsf {B=TRUE}) +\widehat {\mathbb {P}}(\textsf {A=FALSE})\times \mathbb {P}(\textsf {B=FALSE}) = 0.65 \Rightarrow \widehat {\mathbb {P}}(\textsf {A=TRUE})\times 0.25 + ( 1 - \widehat {\mathbb {P}}(\textsf {A=TRUE}))\times 0.75 = 0.65 \Rightarrow \widehat {\mathbb {P}}(\textsf {A=TRUE}) = \frac {0.65-0.75}{-0.5}=0.2. $ , where $\widehat {\mathbb {P}}$ is an estimated proportion.

2.2 Relative Advantages and Limitations

Despite its recent introduction to political scientists by Corbacho et al. (Reference Corbacho, Gingerich, Oliveros and Ruiz-Vega2016) and Gingerich et al. (Reference Gingerich, Oliveros, Corbacho and Ruiz-Vega2016), the crosswise model has not yet been widely used in political science. We think that the primary reason is that it has not been clear to many political scientists how and when the crosswise model is preferable to other indirect questioning techniques. To help remedy this problem, Table 1 summarizes relative advantages of the crosswise model over randomized response, list experiments, and endorsement experiments, which are more commonly used survey techniques in political science (see also Blair et al. Reference Blair, Coppock and Moor2020; Blair, Imai, and Lyall Reference Blair, Imai and Lyall2014; Hoffmann et al. Reference Hoffmann, Diedenhofen, Verschuere and Musch2015; Höglinger and Jann Reference Höglinger and Jann2018; Meisters et al. Reference Meisters, Hoffmann and Musch2020b; Rosenfeld et al. Reference Rosenfeld, Imai and Shapiro2016).Footnote ⁹

Table 1 Relative advantages of the crosswise model.

Note: This table shows potential (dis)advantages of the crosswise model compared to randomized response techniques (RR), list experiments (List), and endorsement experiments (Endorsement).

The potential advantages of the crosswise model are that the design (1) fully protects respondents’ privacy, (2) provides no incentive for respondents to lie about their answers because there is no “safe” (self-protective) option, (3) does not require splitting the sample (as in list and endorsement experiments), (4) does not need an external randomization device (as in randomized response), (5) is relatively efficient compared with other designs, and (6) asks about sensitive attributes directly (unlike in endorsement experiments). In contrast, the potential disadvantage of this design is that its instructions may be harder to understand compared with those of list and endorsement experiments (but are likely much easier than for randomized response). In addition, the crosswise model requires auxiliary data in the form of a known probability distribution of the nonsensitive information (as in randomized response).

Rosenfeld et al. (Reference Rosenfeld, Imai and Shapiro2016) show that randomized response appears to outperform list and endorsement experiments by yielding the least biased and most efficient estimate of the ground truth. Since the crosswise model was developed to outperform randomized response (Yu et al. Reference Yu, Tian and Tang2008), in principle, the crosswise model is expected to better elicit candid answers from survey respondents than any other technique. To date, several validation studies appear to confirm this expectation (Hoffmann et al. Reference Hoffmann, Diedenhofen, Verschuere and Musch2015; Hoffmann and Musch Reference Hoffmann and Musch2016; Höglinger and Jann Reference Höglinger and Jann2018; Höglinger, Jann, and Diekmann Reference Höglinger, Jann and Diekmann2016; Jann, Jerke, and Krumpal Reference Jann, Jerke and Krumpal2012; Jerke et al. Reference Jerke, Johann, Rauhut, Thomas and Velicu2020; Meisters et al. Reference Meisters, Hoffmann and Musch2020a).

Recently, however, it has become increasingly clear that the design has two major limitations, which may undermine confidence in the method. First, it may produce a relatively large share of inattentive respondents who give random answers (Enzmann Reference Enzmann, Eifler and Faulbaum2017; Heck et al. Reference Heck, Hoffmann and Moshagen2018; John et al. Reference John, Loewenstein, Acquisti and Vosgerau2018; Schnapp Reference Schnapp2019). For example, Walzenbach and Hinz (Reference Walzenbach and Hinz2019, 14) conclude that “a considerable number of respondents do not comply with the intended procedure” and it “seriously limit[s] the positive reception the [crosswise model] has received in the survey research so far.” Second, it appears to overestimate the prevalence of sensitive attributes and yields relatively high false positive rates (Höglinger and Jann Reference Höglinger and Jann2018; Kuhn and Vivyan Reference Kuhn and Vivyan2018; Meisters et al. Reference Meisters, Hoffmann and Musch2020a; Nasirian et al. Reference Nasirian2018). After finding this “blind spot,” Höglinger and Diekmann (Reference Höglinger and Diekmann2017, 135) lament that “[p]revious validation studies appraised the [crosswise model] for its easy applicability and seemingly more valid results. However, none of them considered false positives. Our results strongly suggest that in reality the [crosswise model] as implemented in those studies does not produce more valid data than [direct questioning].” Nevertheless, we argue that with a proper understanding of these problems, it is possible to solve them and even extend the usefulness of the crosswise model. To do so, we need to first understand how inattentive respondents lead to bias in estimated prevalence rates.

2.3 Inattentive Respondents Under the Crosswise Model

The problem of inattentive respondents is well known to survey researchers, who have used a variety of strategies for detecting them such as attention checks (Oppenheimer, Meyvis, and Davidenko Reference Oppenheimer, Meyvis and Davidenko2009). Particularly in self-administered surveys, estimates of the prevalence of inattentiveness are often as high as 30%–50% (Berinsky, Margolis, and Sances Reference Berinsky, Margolis and Sances2014). Recently, inattention has also been discussed with respect to list experiments (Ahlquist Reference Ahlquist2018; Blair, Chou, and Imai Reference Blair, Chou and Imai2019). Inattentive respondents are also common under the crosswise model, as we might expect given its relatively complex instructions (Alvarez et al. Reference Alvarez, Atkeson, Levin and Li2019). Researchers have estimated the proportion of inattentive respondents in surveys featuring the crosswise model to be from 12% (Höglinger and Diekmann Reference Höglinger and Diekmann2017) to 30% (Walzenbach and Hinz Reference Walzenbach and Hinz2019).Footnote ¹⁰

To proceed, we first define inattentive respondents under the crosswise model as respondents who randomly choose between “both or neither is true” and “only one statement is true.” We assume that random answers may arise due to multiple reasons, including nonresponse, noncompliance, cognitive difficulty, lying, or any combination of these (Heck et al. Reference Heck, Hoffmann and Moshagen2018; Jerke et al. Reference Jerke, Johann, Rauhut and Thomas2019; Meisters et al. Reference Meisters, Hoffmann and Musch2020a).

We now consider the consequences of inattentive respondents in the crosswise model. Figure 1 plots the bias in the conventional estimator based on hypothetical (and yet typical) data from the crosswise model.Footnote ¹¹ The figure clearly shows the expected bias toward 0.5 and suggests that the bias grows as the percentage of inattentive respondents increases and as the quantity of interest (labeled as $\pi $ ) gets close to 0. We also find that the size of the bias does not depend on the prevalence of the nonsensitive item. To preview the performance of our bias-corrected estimator, each plot also shows estimates based on our estimator, which is robust to the presence of inattentive respondents regardless of the value of the quantity of interest. One key takeaway is that researchers must be more cautious about inattentive respondents when the quantity of interest is expected to be close to zero, because these cases tend to produce larger biases.

Figure 1 Consequences of inattentive respondents. Note: This figure illustrates how the conventional estimator (thick solid line) is biased toward 0.5, whereas the proposed bias-corrected estimator (thin solid line) captures the ground truth (dashed line). Both estimates are shown with bootstrapped 95% confidence intervals (with 1,000 replications). Each panel is based on our simulation in which we set the number of respondents $n=2,000$ , the proportion of a sensitive anchor item $\pi '=0$ , the proportions for nonsensitive items in the crosswise and anchor questions $p=0.15$ and $p'=0.15$ , respectively. The bias increases as the percentage of inattentive respondents increases and the true prevalence rate ( $\pi $ ) decreases. The top-left panel notes parameter values for all six panels (for notation, see the next section).

Figure 1 also speaks to critiques of the crosswise model that have focused on the incidence of false positives, as opposed to bias overall (Höglinger and Diekmann Reference Höglinger and Diekmann2017; Höglinger and Jann Reference Höglinger and Jann2018; Meisters et al. Reference Meisters, Hoffmann and Musch2020a; Nasirian et al. Reference Nasirian2018). While these studies are often agnostic about the source of false positives, the size of the biases in this figure suggests that the main source is likely inattentive respondents.

Several potential solutions to this problem have been recently discussed. The first approach is to identify inattentive survey takers via comprehension checks, remove them from data, and perform estimation and inference on the “cleaned” data (i.e., listwise deletion; Höglinger and Diekmann Reference Höglinger and Diekmann2017; Höglinger and Jann Reference Höglinger and Jann2018; Meisters et al. Reference Meisters, Hoffmann and Musch2020a). One drawback of this method is that it is often challenging to discover which respondents are being inattentive. Moreover, this approach leads to a biased estimate of the quantity of interest unless researchers make the ignorability assumption that having a sensitive attribute is statistically independent of one’s attentiveness (Alvarez et al. Reference Alvarez, Atkeson, Levin and Li2019, 148), which is a reasonably strong assumption in most situations.Footnote ¹² The second solution detects whether respondents answered the crosswise question randomly via direct questioning and then adjusts the prevalence estimates accordingly (Enzmann Reference Enzmann, Eifler and Faulbaum2017; Schnapp Reference Schnapp2019). This approach is valid if researchers assume that direct questioning is itself not susceptible to inattentiveness or social desirability bias as well as that the crosswise question does not affect respondents’ answers to the direct question. Such an assumption, however, is highly questionable, and the proposed corrections have several undesirable properties (Online Appendix B). Below, we present an alternative solution to the problem, which yields an unbiased estimate of the quantity of interest with a much weaker set of assumptions than in existing solutions.

3 The Proposed Methodology

3.1 The Setup

Consider a single sensitive question in a survey with n respondents drawn from a finite target population via simple random sampling with replacement. Suppose that there are no missing data and no respondent opts out of the crosswise question. Let $\pi $ (quantity of interest) be the population proportion of individuals who agree with the sensitive statement (e.g., I am willing to bribe a police officer). Let p be the known population proportion of people who agree with the nonsensitive statement (e.g., My mother was born in January). Finally, let $\lambda $ (and 1 - $\lambda $ ) be the population proportion of individuals who choose “both or neither is true” (and “only one statement is true”). Assuming $\pi \mathop {\perp \!\!\!\!\perp } p$ , Yu et al. (Reference Yu, Tian and Tang2008) introduced the following identity as a foundation of the crosswise model:

(1a)

$$ \begin{align} \mathbb{P}(\text{TRUE-TRUE} \cup \text{FALSE-FALSE}) = \lambda = \pi p + (1 - \pi)(1-p). \end{align} $$

Solving the identity with respect to $\pi $ yields $\pi = \frac {\lambda + p - 1}{2p - 1}$ . Based on this identity, the authors proposed the naïve crosswise estimator:

(1b)

$$ \begin{align} \widehat\pi_{CM} = \frac{\widehat\lambda + p - 1}{2p - 1}, \end{align} $$

where $\widehat \lambda $ is the observed proportion of respondents choosing “both or neither is true” and $p \neq 0.5$ .

We call Equation (1b) the naïve estimator, because it does not take into account the presence of inattentive respondents who give random answers in this design. When one or more respondents do not follow the instruction and randomly pick their answers, the proportion must be (generalizing Walzenbach and Hinz Reference Walzenbach and Hinz2019, 10):

(1c)

$$ \begin{align} \lambda = \Big\{ \pi p + (1-\pi)(1-p) \Big\}\gamma + \kappa(1-\gamma), \end{align} $$

where $\gamma $ is the proportion of attentive respondents and $\kappa $ is the probability with which inattentive respondents pick “both or neither is true.”

We then quantify the bias in the naïve estimator as follows (Online Appendix A.1):

(1d)

$$ \begin{align} B_{CM} & \equiv \mathbb{E}[\widehat\pi_{CM}] - \pi \end{align} $$

(1e)

$$ \begin{align} & = \Bigg( \frac{1}{2} - \frac{1}{2\gamma} \Bigg)\Bigg( \frac{\lambda - \kappa}{p-\frac{1}{2}} \Bigg).\hspace{-3.8pc} \end{align} $$

Here, $B_{CM}$ is a bias with respect to our quantity of interest caused by inattentive respondents. Under regularity conditions ( $\pi < 0.5$ , $p < \frac {1}{2}$ , and $\lambda> \kappa $ ), which are met in typical crosswise models, the bias term is always positive (Online Appendix A.2). This means that the conventional crosswise estimator always overestimates the population prevalence of sensitive attributes in the presence of inattentive respondents.

3.2 Bias-Corrected Crosswise Estimator

To address this pervasive issue, we propose the following bias-corrected crosswise estimator:

(2a)

$$ \begin{align} \widehat \pi_{BC} =\widehat\pi_{CM} - \widehat B_{CM},\\[-24pt] \nonumber \end{align} $$

where $\widehat B_{CM}$ is an unbiased estimator of the bias:

(2b)

$$ \begin{align} \widehat B_{CM} = \Bigg( \frac{1}{2} - \frac{1}{2\widehat\gamma} \Bigg)\Bigg( \frac{\widehat\lambda - \frac{1}{2}}{p-\frac{1}{2}} \Bigg), \end{align} $$

and $\widehat {\gamma }$ is the estimated proportion of attentive respondents in the crosswise question (we discuss how to obtain $\widehat {\gamma }$ below).Footnote ¹³

This bias correction depends on several assumptions. First, we assume that inattentive respondents choose “both or neither is true” with probability 0.5.

Assumption 1 (Random Pick).

Inattentive respondents choose “both or neither is true” with probability 0.5 (i.e., $\kappa =0.5$ ).

Although many studies appear to take Assumption 1 for granted, the survey literature suggests that this assumption may not hold in many situations, because inattentive respondents tend to choose first listed items more than second (or lower) listed ones (Galesic et al. Reference Galesic, Tourangeau, Couper and Conrad2008; Krosnick Reference Krosnick1991). Nevertheless, it is still possible to design a survey to achieve $\kappa =0.5$ regardless of how inattentive respondents choose items. For example, we can achieve this goal by randomizing the order of the listed items in the crosswise model.

The main challenge in estimating the bias is to obtain the estimated proportion of attentive respondents in the crosswise question (i.e., $\widehat \gamma $ ). We solve this problem by adding an anchor question to the survey. The anchor question uses the same format as the crosswise question, but contains a sensitive statement with known prevalence. Our proposed solution generalizes the idea of “zero-prevalence sensitive items” first introduced by Höglinger and Diekmann (Reference Höglinger and Diekmann2017).Footnote ¹⁴ The essence of our approach is to use this additional sensitive statement to (1) estimate the proportion of inattentive respondents in the anchor question and (2) use it to correct for the bias in the crosswise question. For our running example (corruption in Costa Rica), we might consider the following anchor question:

Anchor Question: How many of the following statements is true?

Statement C: I have paid a bribe to be on the top of a waiting list for an organ transplant.

Statement D: My best friend was born in January, February, or March.

• $\underline {\text {Both}}$ statements are true, or $\underline {\text {neither}}$ statement is true.
• $\underline {\text {Only one}}$ statement is true.

Here, Statement C is a sensitive anchor statement whose prevalence is (expected to) be 0.Footnote ¹⁵ Statement D is a nonsensitive statement whose population prevalence is known to researchers just like the nonsensitive statement in the crosswise question. In addition, “Refuse to Answer” or “Don’t Know” may be included.

Let $\pi '$ be the known proportion for Statement C and $p'$ be the known proportion for Statement D (the “prime symbol” indicates the anchor question). Let $\lambda '$ (and 1 - $\lambda '$ ) be the population proportion of people selecting “both or neither is true” (and “only one statement is true”). Let $\gamma '$ be the population proportion of attentive respondents in the anchor question. The population proportion of respondents choosing “both or neither is true” in the anchor question then becomes:

(3a)

$$ \begin{align} \lambda' = \Big\{ \pi' p' + (1-\pi')(1-p') \Big\}\gamma' + \kappa(1-\gamma'). \end{align} $$

Assuming $\kappa =0.5$ (Assumption 1) and $\pi '=0$ (zero-prevalence), we can rearrange Equation (3a) as:

(3b)

$$ \begin{align} \gamma' = \frac{\lambda' - \frac{1}{2}}{\pi'p'+(1-\pi')(1-p')-\frac{1}{2}} = \frac{\lambda' - \frac{1}{2}}{\frac{1}{2}-p'}. \end{align} $$

We can then estimate the proportion of attentive respondents in the anchor question as:

(3c)

$$ \begin{align} \widehat\gamma' = \frac{\widehat\lambda' - \frac{1}{2}}{\frac{1}{2}-p'}, \end{align} $$

where $\widehat {\lambda '}$ is the observed proportion of “both or neither is true” and $\mathbb {E}[\widehat \lambda ']=\lambda '$ (Online Appendix A.6).

Finally, our strategy is to use $\widehat \gamma '$ (obtained from the anchor question) as an estimate of $\gamma $ (the proportion of attentive respondents in the crosswise question) and plug it into Equation (2b) to estimate the bias. This final step yields the complete form of our bias-corrected estimator:

(3d)

$$ \begin{align} \widehat \pi_{BC} =\widehat\pi_{CM} - \underbrace{\Bigg( \frac{1}{2} - \frac{1}{2}\Bigg[\frac{\frac{1}{2}-p'}{\widehat{\lambda}' - \frac{1}{2}}\Bigg] \Bigg)\Bigg( \frac{\widehat\lambda - \frac{1}{2}}{p-\frac{1}{2}} \Bigg) }_{\text{Estimated Bias: } \widehat{B}_{CM} (\widehat{\lambda}, \widehat{\lambda}', p, p') }, \end{align} $$

where it is clear that our estimator depends both on the crosswise question ( $\widehat {\lambda }$ and p) and the anchor question ( $\widehat {\lambda }'$ and $p'$ ). For the proposed estimator to be unbiased, we need to make two assumptions (Online Appendix A.7):

Assumption 2 (Attention Consistency).

The proportion of attentive respondents does not change across the crosswise and anchor questions (i.e., $\gamma = \gamma '$ ).

Assumption 3 (No Carryover).

The crosswise question does not affect respondents’ answers to the anchor question and vice versa (and thus $\mathbb {E}[\widehat {\lambda }]=\lambda $ and $\mathbb {E}[\widehat {\lambda }']=\lambda '$ ).

Assumption 2 will be violated, for example, if the crosswise question has a higher level of inattention than the anchor question. Assumption 3 will be violated, for instance, if asking the anchor question makes some respondents more willing to bribe a police officer in our running example.

Importantly, researchers can design their surveys to make these assumptions more plausible. For example, they can do so by randomizing the order of the anchor and crosswise questions, making them look alike, and using a statement for the anchor question that addresses the same topic and is equally sensitive as the one in the crosswise question. These considerations also help to satisfy Assumption 3 (we provide more examples and practical advice on how to do this in Online Appendix D).

Finally, we derive the variance of the bias-corrected crosswise estimator and its sample analog as follows (Online Appendix A.4):

(4a)

$$ \begin{align} \mathbb{V}(\widehat \pi_{BC}) = \mathbb{V} \Bigg[\frac{\widehat{\lambda}}{\widehat{\lambda}'} \Bigg(\frac{\frac{1}{2}-p'}{2p-1} \Bigg) \Bigg] \quad \text{and} \quad \widehat{\mathbb{V}}(\widehat \pi_{BC}) = \widehat{\mathbb{V}} \Bigg[\frac{\widehat{\lambda}}{\widehat{\lambda}'} \Bigg(\frac{\frac{1}{2}-p'}{2p-1} \Bigg) \Bigg]. \end{align} $$

Note that these variances are necessarily larger than those of the conventional estimator, as the bias-corrected estimator also needs to estimate the proportion of (in)attentive respondents from data. Since no closed-form solutions to these variances are available, we employ the bootstrap to construct confidence intervals.

4 Simulation Studies

To examine the finite sample performance of the bias-corrected estimator, we replicate the simulations that appeared in Figure 1 8,000 times. In each simulation, we draw $\pi $ from the continuous uniform distribution (0.1, 0.45), p and $p'$ from the continuous uniform distribution (0.088, 0.333) (reflecting the smallest and largest values in existing studies), and $\gamma $ from the continuous uniform distribution (0.5, 1). Finally, we repeat the simulations for different sample sizes of 200, 500, 1,000, 2,000, and 5,000 and evaluate the results. Figure 2 demonstrates that the bias-corrected estimator has a significantly lower bias, smaller root-mean-square error, and higher coverage than the naïve estimator.

Figure 2 Finite sample performance of the naïve and bias-corrected estimators. Note: This figure displays the bias, root-mean-square error, and the coverage of the 95% confidence interval of the naïve and bias-corrected estimators. The bias-corrected estimator is unbiased and consistent and has an ideal level of coverage.

The recent survey literature suggests that researchers be cautious when inattentive respondents have different outcome profiles (e.g., political interest) than attentive respondents (Alvarez et al. Reference Alvarez, Atkeson, Levin and Li2019; Berinsky et al. Reference Berinsky, Margolis and Sances2014). Paying attention to this issue is especially important when dealing with sensitive questions, since respondents who have sensitive attributes may be more likely to give random answers than other respondents. To investigate whether our correction is robust to such an association, we replicate our simulations in Figure 1 by varying the prevalence of sensitive attributes among inattentive respondents while holding the prevalence among attentive respondents constant. Figure 3 shows that our bias-corrected estimator properly captures the true prevalence rate of sensitive attributes regardless of the degree of association between inattentiveness and possession of sensitive attributes. In contrast, the naïve estimator does not capture the ground truth when more than about 10% of respondents are inattentive. More simulation results are reported in Online Appendix B.

Figure 3 When respondents with sensitive attributes tend to be more inattentive. Note: This graph illustrates the naïve and bias-corrected estimators with 95% confidence intervals when the prevalence of sensitive attributes among inattentive respondents ( $\pi _{\text {inattentive}}$ ) is higher than that among attentive respondents ( $\pi _{\text {attentive}}$ ) with simulated data (see the top-middle panel for parameter values). Each panel is based on our simulation in which we set the number of respondents $n=2,000$ , the proportion of a sensitive anchor item $\pi '=0$ , the proportions for nonsensitive items in the crosswise and anchor questions $p=0$ and $p'=0$ , respectively. The bias-corrected estimator captures the ground truth (dashed line) even when respondents with sensitive attributes tend to be more inattentive.

5 Extensions of the Bias-Corrected Estimator

5.1 Sensitivity Analysis

While our bias-corrected estimator requires the anchor question, it may not always be available. For such surveys, we propose a sensitivity analysis that shows researchers the sensitivity of their naïve estimates to inattentive respondents and what assumptions they must make to preserve their original conclusions. Specifically, it offers sensitivity bounds for original crosswise estimates by applying the bias correction under varying levels of inattentive respondents. To illustrate, we attempted to apply our sensitivity analysis to all published crosswise studies of sensitive behaviors from 2008 to the present (49 estimates reported in 21 original studies). Figure 4 visualizes the sensitivity bounds for selected studies (see Online Appendix C.1 for full results). For each study, we plot the bias-corrected estimates against varying percentages of inattentive respondents under Assumption 1. Our sensitivity analysis suggests that many studies would not find any statistically significant difference between direct questioning and the crosswise model (echoing Höglinger and Diekmann Reference Höglinger and Diekmann2017, 135) unless they assume that less than 20% of the respondents were inattentive.Footnote ¹⁶

Figure 4 Sensitivity analysis of previous crosswise estimates. Note: This figure plots bias-corrected estimates of the crosswise model over varying percentages of inattentive respondents with the estimate based on direct questioning reported in each study.

5.2 Weighting

While the literature on the crosswise model usually assumes that survey respondents are drawn from a finite target population via simple random sampling with replacement, a growing share of surveys are administered with unrepresentative samples such as online opt-in samples (Franco et al. Reference Franco, Malhotra, Simonovits and Zigerell2017; Mercer, Lau, and Kennedy Reference Mercer, Lau and Kennedy2018). Online opt-in samples are known to be often unrepresentative of the entire population that researchers want to study (Bowyer and Rogowski Reference Bowyer and Rogowski2017; Malhotra and Krosnick Reference Malhotra and Krosnick2007), and analysts using such samples may wish to use weighting to extend their inferences into the population of real interest. In this light, we propose a simple way to include sample weights in the bias-corrected estimator. Online Appendix C.2 presents our theoretical and simulation results. The key idea is that we can apply a Horvitz–Thompson-type estimator (Horvitz and Thompson Reference Horvitz and Thompson1952) to the observed proportions in the crosswise and anchor questions (for a similar result without bias correction, see Chaudhuri (Reference Chaudhuri2012, 380) and Quatember (Reference Quatember2019, 270)).

5.3 Multivariate Regressions for the Crosswise Model

Political scientists often wish to not only estimate the prevalence of sensitive attributes (e.g., corruption in a legislature), but also analyze what kinds of individuals (e.g., politicians) are more likely to have the sensitive attribute and probe whether having the sensitive attribute is associated with another outcome (e.g., reelection). In this vein, regression models for the traditional crosswise model have been proposed in several studies (Gingerich et al. Reference Gingerich, Oliveros, Corbacho and Ruiz-Vega2016; Jann et al. Reference Jann, Jerke and Krumpal2012; Korndörfer, Krumpal, and Schmukle Reference Korndörfer, Krumpal and Schmukle2014; Vakilian, Mousavi, and Keramat Reference Vakilian, Mousavi and Keramat2014). Our contribution is to further extend such a framework by (1) enabling analysts to use the latent sensitive attribute both as an outcome and a predictor while (2) applying our bias correction. Our software can easily implement these regressions while also offering simple ways to perform post-estimation simulation (e.g., generating predicted probabilities with 95% confidence intervals).

We first introduce multivariate regressions in which the latent variable for having a sensitive attribute is used as an outcome variable. Let $Z_{i}\in \{0,1\}$ be a binary variable denoting if respondent i has a sensitive attribute and $T_{i}\in \{0,1\}$ be a binary variable denoting if the same respondent is attentive. Both of these quantities are unobserved latent variables. We define the regression model (conditional expectation) of interest as:

(5a)

$$ \begin{align} \mathbb{E}[Z_{i}|\textbf{X}_{\textbf{i}}=\textbf{x}] = \mathbb{P}(Z_{i}=1|\textbf{X}_{\textbf{i}}=\textbf{x}) = \pi_{\boldsymbol{\beta}}(\textbf{x}), \end{align} $$

where $\textbf {X}_{\textbf{i}}$ is a random vector of respondent i’s characteristics, $\textbf {x}$ is a vector of realized values of such covariates, and $\boldsymbol {\beta }$ is a vector of unknown parameters that associate these characteristics with the probability of having the sensitive attribute. Our goal is to make inferences about these unknown parameters and to use estimated coefficients to produce predictions.

To apply our bias correction, we also introduce the following conditional expectation for being attentive:

(5b)

$$ \begin{align} \mathbb{E}[T_{i}|\textbf{X}_{\textbf{i}}=\textbf{x}] = \mathbb{P}(T_{i}=1|\textbf{X}_{\textbf{i}}=\textbf{x}) = \gamma_{\boldsymbol{\theta}}(\textbf{x}), \end{align} $$

where $\boldsymbol {\theta }$ is a vector of unknown parameters that associate the same respondent’s characteristics with the probability of being attentive. We then assume that $\pi _{\boldsymbol {\beta }}(\textbf {x}) = \text {logit}^{-1}(\boldsymbol {\beta } \textbf {X}_{\textbf{i}})$ and $\gamma _{\boldsymbol {\theta }}(\textbf {x}) = \text {logit}^{-1}(\boldsymbol {\theta } \textbf {X}_{\textbf{i}})$ , acknowledging that other functional forms are also possible.

Next, we substitute these quantities into Equation (1c) by assuming $\pi '=0$ (zero-prevalence for Statement C):

(6a)

$$ \begin{align} \lambda_{\boldsymbol{\beta,\theta}}(\textbf{X}_{\textbf{i}}) & = \underbrace{\Big(\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}) p + (1-\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}))(1-p)\Big) \gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2}\Big(1-\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}})\Big)}_{\text{Conditional probability of choosing "both or neither is true" in the crosswise question}}, \end{align} $$

(6b)

$$ \begin{align} \lambda^{\prime}_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) & = \underbrace{ \Big( \frac{1}{2}-p' \Big)\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2}}_{\text{Conditional probability of choosing "both or neither is true" in the anchor question}}. \end{align} $$

Finally, let $Y_{i}\in \{0,1\}$ and $A_{i}\in \{0,1\}$ be observed variables denoting if respondent i chooses “both or neither is true” in the crosswise and anchor questions, respectively. Assuming $Y_{i}\mathop {\perp \!\!\!\!\perp } A_{i}|\textbf {X}_{\textbf{i}}$ with Assumptions 1–3, we model that $Y_{i}$ and $A_{i}$ follow independent Bernoulli distributions with success probabilities $\lambda _{\boldsymbol {\beta ,\theta }}(\textbf {X}_{\textbf{i}})$ and $\lambda ^{\prime }_{\boldsymbol {\theta }}(\textbf {X}_{\textbf{i}})$ and construct the following likelihood function:

(7)

$$ \begin{align} \mathcal{L}(\boldsymbol{\beta}, \boldsymbol{\theta}|\{\textbf{X}_{\textbf{i}}, Y_{i}, A_{i}\}_{i=1}^{n}, p, p') & \propto \prod_{i=1}^{n} \Big\{ \lambda_{\boldsymbol{\beta,\theta}}(\textbf{X}_{\textbf{i}}) \Big\}^{Y_{i}} \Big\{ 1 - \lambda_{\boldsymbol{\beta,\theta}}(\textbf{X}_{\textbf{i}}) \Big\}^{1-Y_{i}} \Big\{ \lambda^{\prime}_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) \Big\}^{A_{i}}\Big\{ 1 - \lambda^{\prime}_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) \Big\}^{1 - A_{i}}\nonumber\\ & = \prod_{i=1}^{n} \Big\{ \Big(\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}) p + (1-\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}))(1-p)\Big) \gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2}\Big(1-\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}})\Big) \Big\}^{Y_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ 1 - \Big[\Big(\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}) p + (1-\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}))(1-p)\Big) \gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2}\Big(1-\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}})\Big) \Big] \Big\}^{1-Y_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ \Big( \frac{1}{2}-p' \Big)\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big\}^{A_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ 1 - \Big[ \Big( \frac{1}{2}-p' \Big)\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big] \Big\}^{1 - A_{i}}\nonumber\\ & = \prod_{i=1}^{n} \Big\{ \Big((2p-1)\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}) + \Big(\frac{1}{2}-p \Big)\Big) \gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big\}^{Y_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ 1 - \Big[ \Big((2p-1)\pi_{\boldsymbol{\beta}}(\textbf{X}_{\textbf{i}}) + \Big(\frac{1}{2}-p \Big)\Big) \gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big] \Big\}^{1-Y_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ \Big( \frac{1}{2}-p' \Big)\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big\}^{A_{i}}\nonumber\\ & \quad \ \ \ \times \Big\{ 1 - \Big[ \Big( \frac{1}{2}-p' \Big)\gamma_{\boldsymbol{\theta}}(\textbf{X}_{\textbf{i}}) + \frac{1}{2} \Big] \Big\}^{1 - A_{i}}. \end{align} $$

Our simulations show that estimating the above model can recover both primary ( $\boldsymbol {\beta }$ ) and auxiliary parameters ( $\boldsymbol {\theta }$ ) (Online Appendix C.3). Online Appendix C.4 presents multivariate regressions in which the latent variable for having a sensitive attribute is used as a predictor.

5.4 Sample Size Determination and Parameter Selection

When using the crosswise model with our procedure, researchers may wish to choose the sample size and specify other design parameters (i.e., $\pi '$ , p, and $p'$ ) so that they can obtain (1) high statistical power for hypothesis testing and/or (2) narrow confidence intervals for precise estimation. To fulfill these needs, we develop power analysis and data simulation tools appropriate for our bias-corrected estimator (Online Appendix C.5).

6 Concluding Remarks

The crosswise model is a simple but powerful survey-based tool for investigating the prevalence of sensitive attitudes and behavior. To overcome two limitations of the design, we proposed a simple design-based solution using an anchor question. We also provided several extensions of our proposed bias-corrected estimator. Future research could further extend our methodology by applying it to nonbinary sensitive questions, allowing for multiple sensitive statements and/or anchor questions, handling missing data more efficiently, and integrating our method with more efficient sampling schemes (e.g., Reiber, Schnuerch, and Ulrich Reference Reiber, Schnuerch and Ulrich2020).

With these developments, we hope to facilitate the wider adoption of the (bias-corrected) crosswise model in political science, as it may have several advantages over traditional randomized response techniques, list experiments, and endorsement experiments. Future research should also explore how to compare and combine results from the crosswise model and these other techniques.

Our work also speaks to recent scholarship on inattentive respondents in self-administered surveys. While it is increasingly common for political scientists to use self-administered surveys, the most common way of dealing with inattentive respondents is to try to directly identify them through “screener”-type questions. However, a growing number of respondents are experienced survey-takers who may recognize and avoid such “traps” (Alvarez et al. Reference Alvarez, Atkeson, Levin and Li2019). Our method is one example of a different approach to handling inattentive respondents that is less likely to be recognized by respondents, works without measuring individual-level inattentiveness, and does not need to drop inattentive respondents from the sample. Future research should explore whether a similar approach to inattention would work in other question formats.

Acknowledgments

The previous version of this manuscript was titled as “Bias-Corrected Crosswise Estimators for Sensitive Inquiries.” For helpful comments and valuable feedback, we would like to thank Graeme Blair, Gustavo Guajardo, Dongzhou Huang, Colin Jones, Gary King, Shiro Kuriwaki, Jeff Lewis, John Londregan, Yui Nishimura, Michelle Torres, and three anonymous reviewers.

Data Availability Statement

Replication code for this article is available at https://doi.org/10.7910/DVN/AHWMIL (Atsusaka and Stevenson Reference Atsusaka and Stevenson2021). An open-source software R package cWise: A (Cross)Wise Method to Analyze Sensitive Survey Questions, which implements our methods, is available at https://github.com/YukiAtsusaka/cWise.

Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2021.43.

Footnotes

Edited by Jeff Gill

1 Such topics include racial animus (Kuklinski, Cobb, and Gilens Reference Kuklinski, Cobb and Gilens1997), attitudes toward non-Christian candidates (Kane, Craig, and Wald Reference Kane, Craig and Wald2004), support for militant organizations (Lyall, Blair, and Imai Reference Lyall, Blair and Imai2013), support for authoritarian regimes (Frye et al. Reference Frye, Gehlbach, Marquardt and Reuter2017), voting for right-wing populist parties (Lehrer, Juhl, and Gschwend Reference Lehrer, Juhl and Gschwend2019), and vote buying (Cruz Reference Cruz2019).

2 For a comparative and validation study for these methods, see Rosenfeld, Imai, and Shapiro (Reference Rosenfeld, Imai and Shapiro2016).

3 About 60 studies related to the design were published between 2016 and 2021 across disciplines.

4 This idea of intentionally injecting statistically tractable noise originated from the randomized response technique, which was inherited by the literature on differential privacy (Evans and King Reference Evans and King2021).

5 One strength of the design is that there is not a clear best strategy for noncooperators and so by including some simple design features (discussed below), most noncooperative response strategies can be made to be “as if” random. As such, we refer to any such strategy as random or “inattentive.”

6 Several other studies have contributed important work relevant to building a solution, though each stops short of actually offering one (Höglinger and Diekmann Reference Höglinger and Diekmann2017; Höglinger and Jann Reference Höglinger and Jann2018; Meisters et al. Reference Meisters, Hoffmann and Musch2020a; Walzenbach and Hinz Reference Walzenbach and Hinz2019).

7 Often, this proportion is less than 0.5, and direct questioning is expected to underestimate the quantity. Thus, we assume that the quantity of interest is always less than 0.5 in the rest of the argument.

8 Prevalence rates for nonsensitive questions can come from census data or other kinds of statistical regularities like the Newcomb–Benford law (Kundt Reference Kundt2014). Furthermore, in Online Appendix C.6, we present an approach that relies on a virtual die roll but that overcomes the respondent’s natural skepticism that such information will be recorded.

9 We only consider the simplest (i.e., benchmark) design for each technique.

10 The crosswise model has been implemented in more than eight countries with different platforms and formats, and inattention has been a concern in many of these studies.

11 Several studies have conjectured that the presence of inattentive respondents biases the point estimate toward 0.5, such as Appendix 7 of John et al. (Reference John, Loewenstein, Acquisti and Vosgerau2018), Figure C.4 in the Online Appendix of Höglinger and Diekmann (Reference Höglinger and Diekmann2017), Heck et al. (Reference Heck, Hoffmann and Moshagen2018, 1899), Enzmann (Reference Enzmann, Eifler and Faulbaum2017), and Schnapp (Reference Schnapp2019, 311)).

12 For a more general treatment of the consequences of dropping inattentive respondents in experiments, see Aronow, Baron, and Pinson (Reference Aronow, Baron and Pinson2019).

13 Following Yu et al. (Reference Yu, Tian and Tang2008), we impose a logical constrain: $\widehat \pi _{BC} = \min (1, \max (0,\widehat \pi _{BC})).$ As shown below, $\mathbb {E}[\widehat \lambda ] = \lambda $ and $\mathbb {E}[\widehat \gamma ] = \gamma $ , and thus, by the linearity of the expected value operator, $\mathbb {E}[\widehat B_{CM}] = B_{CM}$ .

14 More generally, previous research has used sensitive items with known prevalence only for validation purposes. For example, Höglinger and Diekmann (Reference Höglinger and Diekmann2017) use receiving an organ and the history of having a rare disease as zero-prevalence sensitive items in the survey about “organ donation and health.” Similarly, Rosenfeld et al. (Reference Rosenfeld, Imai and Shapiro2016) employ official county-level results of an anti-abortion referendum, whereas Kuhn and Vivyan (Reference Kuhn and Vivyan2018) rely on official turnout records as sensitive items with know prevalence.

15 More generally, the anchor question can feature any sensitive item whose true prevalence is known (i.e., it need not be 0), although zero-prevalence sensitive items lead to more efficient estimates.

16 Our software also allows researchers to set values of $\kappa $ other than 0.5 depending on the nature of their surveys.

References

Ahlquist, J. S. 2018. “List Experiment Design, Non-Strategic Respondent Error, and Item Count Technique Estimators.” Political Analysis 26 (1): 34–53.CrossRef Google Scholar

Alvarez, R. M., Atkeson, L. R., Levin, I., and Li, Y.. 2019. “Paying Attention to Inattentive Survey Respondents.” Political Analysis 27 (2): 145–162.10.1017/pan.2018.57CrossRef Google Scholar

Aronow, P. M., Baron, J., and Pinson, L.. 2019. “A Note on Dropping Experimental Subjects Who Fail a Manipulation Check.” Political Analysis 27 (4): 572–589.CrossRef Google Scholar

Atsusaka, Y., and Stevenson, R. T.. 2021. “Replication Data for: A Bias-Corrected Estimator for the Crosswise Model with Inattentive Respondents.” https://doi.org/10.7910/DVN/AHWMIL, Harvard Dataverse, V1, UNF:6:MzsjbSWLP+p82SZ/w8bjcQ== [fileUNF].CrossRef Google Scholar

Berinsky, A. J., Margolis, M. F., and Sances, M. W.. 2014. “Separating the Shirkers from the Workers? Making Sure Respondents Pay Attention on Self-Administered Surveys.” American Journal of Political Science 58 (3): 739–753.CrossRef Google Scholar

Blair, G., Chou, W., and Imai, K.. 2019. “List Experiments with Measurement Error.” Political Analysis 27 (4): 455–480.CrossRef Google Scholar

Blair, G., Coppock, A., and Moor, M.. 2020. “When to Worry About Sensitivity Bias: A Social Reference Theory and Evidence from 30 Years of List Experiments.” American Political Science Review 114 (4): 1297–1315.CrossRef Google Scholar

Blair, G., Imai, K., and Lyall, J.. 2014. “Comparing and Combining List and Endorsement Experiments: Evidence from Afghanistan.” American Journal of Political Science 58 (4): 1043–1063.10.1111/ajps.12086CrossRef Google Scholar

Blair, G., Imai, K., and Zhou, Y.-Y.. 2015. “Design and Analysis of the Randomized Response Technique.” Journal of the American Statistical Association 110 (511): 1304–1319.CrossRef Google Scholar

Bowyer, B. T., and Rogowski, J. C.. 2017. “Mode Matters: Evaluating Response Comparability in a Mixed-Mode Survey.” Political Science Research and Methods 5 (2): 295.10.1017/psrm.2015.28CrossRef Google Scholar

Chaudhuri, A. 2012. “Unbiased Estimation of a Sensitive Proportion in General Sampling by Three Nonrandomized.” Journal of Statistical Theory and Practice 6 (2): 376–381.CrossRef Google Scholar

Corbacho, A., Gingerich, D. W., Oliveros, V., and Ruiz-Vega, M.. 2016. “Corruption as a Self-Fulfilling Prophecy: Evidence from a Survey Experiment in Costa Rica.” American Journal of Political Science 60 (4): 1077–1092.CrossRef Google Scholar

Cruz, C. 2019. “Social Networks and the Targeting of Vote Buying.” Comparative Political Studies 52 (3): 382–411.CrossRef Google Scholar

Enzmann, D. 2017. “Chapter 10: Die Anwendbarkeit Des Crosswise-Modells Zur Prüfung Kultureller Unter Schiede Sozial erwünschten Antwortverhaltens.” In Methodische Probleme von Mixed-Mode-Ansätzen in der Umfrageforschung, edited by Eifler, S. and Faulbaum, F., 239–277. Wiesbaden: Springer VS.CrossRef Google Scholar

Evans, G., and King, G.. 2021. “Statistically Valid Inferences from Differentially Private Data Releases, with Application to the Facebook URLs Dataset.” Political Analysis, to appear.Google Scholar

Franco, A., Malhotra, N., Simonovits, G., and Zigerell, L.. 2017. “Developing Standards for Post-Hoc Weighting in Population-Based Survey Experiments.” Journal of Experimental Political Science 4 (2): 161–172.CrossRef Google Scholar

Frye, T., Gehlbach, S., Marquardt, K. L., and Reuter, O. J.. 2017. “Is Putin’s Popularity Real?” Post-Soviet Affairs 33 (1): 1–15.CrossRef Google Scholar

Galesic, M., Tourangeau, R., Couper, M. P., and Conrad, F. G.. 2008. “Eye-Tracking Data: New Insights on Response Order Effects and Other Cognitive Shortcuts in Survey Responding.” Public Opinion Quarterly 72 (5): 892–913.10.1093/poq/nfn059CrossRef Google Scholar PubMed

Gingerich, D. W., Oliveros, V., Corbacho, A., and Ruiz-Vega, M.. 2016. “When to Protect? Using the Crosswise Model to Integrate Protected and Direct Responses in Surveys of Sensitive Behavior.” Political Analysis 24 (2): 132–156.10.1093/pan/mpv034CrossRef Google Scholar

Heck, D. W., Hoffmann, A., and Moshagen, M.. 2018. “Detecting Nonadherence Without Loss in Efficiency: A Simple Extension of the Crosswise Model.” Behavior Research Methods 50 (5): 1895–1905.CrossRef Google Scholar PubMed

Hoffmann, A., Diedenhofen, B., Verschuere, B., and Musch, J.. 2015. “A Strong Validation of the Crosswise Model Using Experimentally-Induced Cheating Behavior.” Experimental Psychology 62 (6): 403–414.10.1027/1618-3169/a000304CrossRef Google Scholar

Hoffmann, A., Meisters, J., and Musch, J.. 2020. “On the Validity of Non-Randomized Response Techniques: An Experimental Comparison of the Crosswise Model and the Triangular Model.” Behavior Research Methods 52: 1768–1782.CrossRef Google Scholar PubMed

Hoffmann, A., and Musch, J.. 2016. “Assessing the Validity of Two Indirect Questioning Techniques: A Stochastic Lie Detector Versus the Crosswise Model.” Behavior Research Methods 48 (3): 1032–1046.10.3758/s13428-015-0628-6CrossRef Google Scholar PubMed

Hoffmann, A., and Musch, J.. 2019. “Prejudice Against Women Leaders: Insights from an Indirect Questioning Approach.” Sex Roles 80 (11): 681–692.CrossRef Google Scholar

Höglinger, M., and Diekmann, A.. 2017. “Uncovering a Blind Spot in Sensitive Question Research: False Positives Undermine the Crosswise-Model RRT.” Political Analysis 25 (1): 131–137.CrossRef Google Scholar

Höglinger, M., and Jann, B.. 2018. “More Is Not Always Better: An Experimental Individual-Level Validation of the Randomized Response Technique and the Crosswise Model.” PLoS One 13 (8): e0201770.CrossRef Google Scholar PubMed

Höglinger, M., Jann, B., and Diekmann, A.. 2016. “Sensitive Questions in Online Surveys: An Experimental Evaluation of Different Implementations of the Randomized Response Technique and the Crosswise Model.” Survey Research Methods 10 (3): 171–187.Google Scholar

Horvitz, D. G., and Thompson, D. J.. 1952. “A Generalization of Sampling Without Replacement from a Finite Universe.” Journal of the American Statistical Association 47 (260): 663–685.CrossRef Google Scholar

Jann, B., Jerke, J., and Krumpal, I.. 2012. “Asking Sensitive Questions Using the Crosswise Model: An Experimental Survey Measuring Plagiarism.” Public Opinion Quarterly 76 (1): 32–49.10.1093/poq/nfr036CrossRef Google Scholar

Jerke, J., Johann, D., Rauhut, H., and Thomas, K.. 2019. “Too Sophisticated Even for Highly Educated Survey Respondents? A Qualitative Assessment of Indirect Question Formats for Sensitive Questions.” Survey Research Methods 13 (3): 319–351.Google Scholar

Jerke, J., Johann, D., Rauhut, H., Thomas, K., and Velicu, A.. 2020. “Handle with Care: Implementation of the List Experiment and Crosswise Model in a Large-Scale Survey on Academic Misconduct.” Field Methods. https://doi.org/10.1177/1525822X20985629 Google Scholar

John, L. K., Loewenstein, G., Acquisti, A., and Vosgerau, J.. 2018. “When and Why Randomized Response Techniques (Fail to) Elicit the Truth.” Organizational Behavior and Human Decision Processes 148: 101–123.CrossRef Google Scholar

Kane, J. G., Craig, S. C., and Wald, K. D.. 2004. “Religion and Presidential Politics in Florida: A List Experiment.” Social Science Quarterly 85 (2): 281–293.CrossRef Google Scholar

Korndörfer, M., Krumpal, I., and Schmukle, S. C.. 2014. “Measuring and Explaining Tax Evasion: Improving Self Reports Using the Crosswise Model.” Journal of Economic Psychology 45: 18–32.10.1016/j.joep.2014.08.001CrossRef Google Scholar

Krosnick, J. A. 1991. “Response Strategies for Coping with the Cognitive Demands of Attitude Measures in Surveys.” Applied Cognitive Psychology 5 (3): 213–236.CrossRef Google Scholar

Kuhn, P. M., and Vivyan, N.. 2018. “Reducing Turnout Misreporting in Online Surveys.” Public Opinion Quarterly 82 (2): 300–321.CrossRef Google Scholar

Kuklinski, J. H., Cobb, M. D., and Gilens, M.. 1997. “Racial Attitudes and the ‘New South’.” The Journal of Politics 59 (2): 323–349.10.1017/S0022381600053470CrossRef Google Scholar

Kundt, T. 2014. “Applying ‘Benford’s Law’ to the Crosswise Model: Findings from an Online Survey on Tax Evasion.” Available at SSRN 2487069.CrossRef Google Scholar

Lehrer, R., Juhl, S., and Gschwend, T.. 2019. “The Wisdom of Crowds Design for Sensitive Survey Questions.” Electoral Studies 57: 99–109.CrossRef Google Scholar

Lyall, J., Blair, G., and Imai, K.. 2013. “Explaining Support for Combatants During Wartime: A Survey Experiment in Afghanistan.” American Political Science Review 107 (4): 679–705.10.1017/S0003055413000403CrossRef Google Scholar

Malhotra, N., and Krosnick, J. A.. 2007. “The Effect of Survey Mode and Sampling on Inferences About Political Attitudes and Behavior: Comparing the 2000 and 2004 ANES to Internet Surveys with Nonprobability Samples.” Political Analysis 15 (3): 286–323.CrossRef Google Scholar

Meisters, J., Hoffmann, A., and Musch, J.. 2020a. “Can Detailed Instructions and Comprehension Checks Increase the Validity of Crosswise Model Estimates?” PLoS One 15 (6): e0235403.CrossRef Google Scholar PubMed

Meisters, J., Hoffmann, A., and Musch, J.. 2020b. “Controlling Social Desirability Bias: An Experimental Investigation of the Extended Crosswise Model.” PLoS One 15 (12): e0243384.CrossRef Google Scholar PubMed

Mercer, A., Lau, A., and Kennedy, C.. 2018. “For Weighting Online Opt-In Samples, What Matters Most.” Pew Research Center.Google Scholar

Nasirian, M., et al. 2018. “Does Crosswise Method Cause Overestimation? An Example to Estimate the Frequency of Symptoms Associated with Sexually Transmitted Infections in General Population: A Cross Sectional Study.” Health Scope 7 (3): e55357.10.5812/jhealthscope.55357CrossRef Google Scholar

Oliveros, V., and Gingerich, D. W.. 2020. “Lying About Corruption in Surveys: Evidence from a Joint Response Model.” International Journal of Public Opinion Research 32 (2): 384–395.CrossRef Google Scholar

Oppenheimer, D. M., Meyvis, T., and Davidenko, N.. 2009. “Instructional Manipulation Checks: Detecting Satisficing to Increase Statistical Power.” Journal of Experimental Social Psychology 45 (4): 867–872.10.1016/j.jesp.2009.03.009CrossRef Google Scholar

Quatember, A. 2019. “A Discussion of the Two Different Aspects of Privacy Protection in Indirect Questioning Designs.” Quality & Quantity 53 (1): 269–282.10.1007/s11135-018-0751-4CrossRef Google Scholar

Reiber, F., Schnuerch, M., and Ulrich, R.. 2020. “Improving the Efficiency of Surveys with Randomized Response Models: A Sequential Approach Based on Curtailed Sampling.” Psychological Methods. https://doi.org/10.1037/met0000353 CrossRef Google Scholar

Rosenfeld, B., Imai, K., and Shapiro, J. N.. 2016. “An Empirical Validation Study of Popular Survey Methodologies for Sensitive Questions.” American Journal of Political Science 60 (3): 783–802.CrossRef Google Scholar

Schnapp, P. 2019. “Sensitive Question Techniques and Careless Responding: Adjusting the Crosswise Model for Random Answers.” methods, data, analyses 13 (2): 13. https://doi.org/10.12758/mda.2019.03 Google Scholar

Vakilian, K., Mousavi, S. A., and Keramat, A.. 2014. “Estimation of Sexual Behavior in the 18-to-24-Years-Old Iranian Youth Based on a Crosswise Model Study.” BMC Research Notes 7 (1): 28.CrossRef Google Scholar PubMed

Walzenbach, S., and Hinz, T.. 2019. “Pouring Water into Wine: Revisiting the Advantages of the Crosswise Model for Asking Sensitive Questions.” Survey Methods: Insights from the Field, 1–16. https://doi.org/10.13094/SMIF-2019-00002 Google Scholar

Warner, S. L. 1965. “Randomized Response: A Survey Technique for Eliminating Evasive Answer Bias.” Journal of the American Statistical Association 60 (309): 63–69.CrossRef Google Scholar PubMed

Waubert de Puiseau, B., Hoffmann, A., and Musch, J.. 2017. “How Indirect Questioning Techniques May Promote Democracy: A Preelection Polling Experiment.” Basic and Applied Social Psychology 39 (4): 209–217.CrossRef Google Scholar

Yu, J.-W., Tian, G.-L., and Tang, M.-L.. 2008. “Two New Models for Survey Sampling with Sensitive Characteristic: Design and Analysis.” Metrika 67 (3): 251.10.1007/s00184-007-0131-xCrossRef Google Scholar

Table 1 Relative advantages of the crosswise model.

Figure 1 Consequences of inattentive respondents. Note: This figure illustrates how the conventional estimator (thick solid line) is biased toward 0.5, whereas the proposed bias-corrected estimator (thin solid line) captures the ground truth (dashed line). Both estimates are shown with bootstrapped 95% confidence intervals (with 1,000 replications). Each panel is based on our simulation in which we set the number of respondents $n=2,000$, the proportion of a sensitive anchor item $\pi '=0$, the proportions for nonsensitive items in the crosswise and anchor questions $p=0.15$ and $p'=0.15$, respectively. The bias increases as the percentage of inattentive respondents increases and the true prevalence rate ($\pi $) decreases. The top-left panel notes parameter values for all six panels (for notation, see the next section).

Figure 2 Finite sample performance of the naïve and bias-corrected estimators. Note: This figure displays the bias, root-mean-square error, and the coverage of the 95% confidence interval of the naïve and bias-corrected estimators. The bias-corrected estimator is unbiased and consistent and has an ideal level of coverage.

Figure 3 When respondents with sensitive attributes tend to be more inattentive. Note: This graph illustrates the naïve and bias-corrected estimators with 95% confidence intervals when the prevalence of sensitive attributes among inattentive respondents ($\pi _{\text {inattentive}}$) is higher than that among attentive respondents ($\pi _{\text {attentive}}$) with simulated data (see the top-middle panel for parameter values). Each panel is based on our simulation in which we set the number of respondents $n=2,000$, the proportion of a sensitive anchor item $\pi '=0$, the proportions for nonsensitive items in the crosswise and anchor questions $p=0$ and $p'=0$, respectively. The bias-corrected estimator captures the ground truth (dashed line) even when respondents with sensitive attributes tend to be more inattentive.

Figure 4 Sensitivity analysis of previous crosswise estimates. Note: This figure plots bias-corrected estimates of the crosswise model over varying percentages of inattentive respondents with the estimate based on direct questioning reported in each study.

Atsusaka and Stevenson Dataset

Dataset

https://doi.org/10.7910/DVN/AHWMIL

Link

Atsusaka and Stevenson supplementary material

PDF 572.9 KB

Article contents

A Bias-Corrected Estimator for the Crosswise Model with Inattentive Respondents

Abstract

Keywords

1 Introduction

2 Promise and Pitfalls of the Crosswise Model

2.1 The Crosswise Model

2.2 Relative Advantages and Limitations

2.3 Inattentive Respondents Under the Crosswise Model

3 The Proposed Methodology

3.1 The Setup

3.2 Bias-Corrected Crosswise Estimator

Assumption 1 (Random Pick).

Assumption 2 (Attention Consistency).

Assumption 3 (No Carryover).

4 Simulation Studies

5 Extensions of the Bias-Corrected Estimator

5.1 Sensitivity Analysis

5.2 Weighting

5.3 Multivariate Regressions for the Crosswise Model

5.4 Sample Size Determination and Parameter Selection

6 Concluding Remarks

Acknowledgments

Data Availability Statement

Supplementary Material

Footnotes

References

Atsusaka and Stevenson Dataset

Atsusaka and Stevenson supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests