1 Introduction
In the social sciences, researchers often use ranking questions to study public opinion and preferences on various topics (Alvo and Yu Reference Alvo and Yu2014; Marden Reference Marden1996). Using ranking questions, for example, scholars of American politics study media framing (Nelson, Clawson, and Oxley Reference Nelson, Clawson and Oxley1997), representation (Costa Reference Costa2021; Tate Reference Tate2004), blame attribution (Malhotra and Kuo Reference Malhotra and Kuo2008), political values (Ciuk Reference Ciuk2016), and redistricting (Kaufman, King, and Komisarchik Reference Kaufman, King and Komisarchik2021). Comparative politics researchers also study nationalism (Miles and Rochefort Reference Miles and Rochefort1991), post-materialism (Inglehart and Abramson Reference Inglehart and Abramson1994), candidate selection (Jankowski and Rehmert Reference Jankowski and Rehmert2022), and ethnic identity (McMurry Reference McMurry2022), whereas international relations works examine foreign aid (Dietrich Reference Dietrich2016), economic coercion (Gueorguiev, McDowell, and Steinberg Reference Gueorguiev, McDowell and Steinberg2020), and sexual violence (Agerberg and Kreft Reference Agerberg and Kreft2023). In addition to research, ranking questions are used in actual and polling of elections with ordinal ballots, such as ranked-choice voting (RCV), single transferable vote, Borda count, and Coombs rule (Atsusaka Reference Atsusaka2025; Shugart and Taagepera Reference Shugart and Taagepera2017).
Despite the wide usage of ranking, relatively little has been discussed and understood about the general nature of measurement errors in ranking questions. For example, only 3 out of 28 studies that use ranking questions published in the American Political Science Review, the American Journal of Political Science, and the Journal of Politics, 2012–2023, mention potential measurement errors. Meanwhile, some methodological studies have examined measurement issues, but focus only on specific aspects of ranking questions, such as item-order effects (Krosnick and Alwin Reference Krosnick and Alwin1987; Malhotra Reference Malhotra2009; Serenko and Bontis Reference Serenko and Bontis2013), question-order effects (Tranter and Western Reference Tranter and Western2010), their advantage in eliciting relative preferences compared to other questions (Alwin and Krosnick Reference Alwin and Krosnick1985; Dillman, Smyth, and Christian Reference Dillman, Smyth and Christian2014; Kaufman et al. Reference Kaufman, King and Komisarchik2021; Krosnick Reference Krosnick1999; Krosnick and Alwin Reference Krosnick and Alwin1988; McCarty and Shrum Reference McCarty and Shrum2000), and measurement issues by format (Blasius Reference Blasius2012; Genter, Trejo, and Nichols Reference Genter, Trejo and Nichols2022; Smyth, Olson, and Burke Reference Smyth, Olson and Burke2018), or by devices and layouts (Revilla and Couper Reference Revilla and Couper2018).Footnote 1
In this paper, we introduce a general statistical framework for understanding measurement errors in ranking questions based on random responses—rankings based on arbitrary patterns independent of respondents’ underlying preferences. With the framework, we clarify what ranking-based quantities researchers can study, what random responses look like in different formats, and why using observed data alone may induce measurement errors. Moreover, we propose simple design-based methods to correct the bias with respect to various quantities of interest due to measurement errors. Using item order randomization, we learn about the direction of the bias due to random responses. Leveraging anchor questions—auxiliary ranking questions whose correct answers are ex-ante known to researchers—we estimate the proportion of random responses. The two pieces of information allow our bias-corrected estimators to estimate many quantities of interest in nonparametric and parametric analyses. In contrast to existing studies, the proposed framework is general and encompasses measurement issues from various sources discussed for ranking questions while also contributing to the growing scholarship on design-based methods to address measurement errors in survey research and beyond (Atsusaka and Stevenson Reference Atsusaka and Stevenson2023; Berinsky et al. Reference Berinsky, Frydman, Margolis, Sances and Valerio2024; Clayton et al. Reference Clayton, Horiuchi, Kaufman, King and Komisarchik2023; Horowitz and Manski Reference Horowitz and Manski1995; Kane and Barabas Reference Kane and Barabas2019; Kane, Velez, and Barabas Reference Kane, Velez and Barabas2023; Tyler, Grimmer, and Westwood Reference Tyler and Westwood2024).
At first glance, the problem of random responses seems resolvable by randomizing the order of items, and some of the most well-intended studies adopt this strategy.Footnote 2 However, we show that randomization alone does not remove the bias. Instead, randomization makes random responses follow a uniform distribution. Although this is significantly better than having no randomization, in which measurement errors have an unpredictable direction, the bias still remains under randomization—the distribution of observed rankings is now pulled towards a uniform distribution (i.e., indifference among items). This way, even under randomization, random responses can mask otherwise salient ranked preferences among respondents and “dilute” empirical results. Thus, understanding and overcoming the limitation of randomization has important implications for research using ranking questions.
Measurement errors in ranking questions can also have implications for electoral institutions and democratic representation. Recently, observations of improper ranked ballots in RCV have also been discussed in light of its growing adoption in American elections (Alvarez, Hall, and Levin Reference Alvarez, Hall and Levin2018; Neely and Cook Reference Neely and Cook2008; Neely and McDaniel Reference Neely and McDaniel2015). For example, Atkeson et al. (Reference Atkeson, McKown-Dawson, Santucci and Saunders2024) study voter confusion in RCV, arguing that voting errors may emerge due to the complexity of ballots and the lack of information on candidates. Cormack (Reference Cormack2024) examines over-voting—ranking the same candidate more than once, finding their prevalence are higher in areas with lower education and income levels. Similarly, Pettigrew and Radley (Reference Pettigrew and Radley2023) classify ballot-marking errors and ballot rejections, concluding that, on average, about 4.7% of voters make at least one type of error. Furthermore, Atsusaka (Reference Atsusaka2025) analyzes ballot order effects—the effect of candidate order on the ballot on voters’ entire candidate rankings, showing that about 0.6%–3.0% of voters may provide ranked ballots based on “donkey voting” (ranking candidates in the order they appear on the grid-style ballot). Thus, understanding measurement errors and their solutions may help assess the quality and validity of elections under RCV.
This paper is organized as follows. In Section 2, we introduce our motivating application of measuring and analyzing the relative importance of different identities. In Section 3, we define measurement errors from random responses and introduce our statistical framework for understanding the consequences of measurement errors. Section 4 introduces our design-based methodology to correct the resulting bias. In Section 5, we illustrate our methods with our empirical application and show that about 30% of respondents offered random responses, which can affect our conclusion. Section 6 provides extended analyses of our methods by comparing our methods to alternative designs and addressing a wider population of interest. In Section 7, we discuss our work’s limitations and future directions. Our methods will be available through the R package rankingQ .
2 Motivating Application: Relative Partisanship
Partisan identity has been one of the most influential variables in modern American politics (Green, Palmquist, and Schickler Reference Green, Palmquist and Schickler2002; Huddy and Bankert Reference Huddy and Bankert2017; West and Iyengar Reference West and Iyengar2022). Although many works stress the centrality of partisan identity in political and social behavior, relatively little is understood about the relative importance of partisan identity (or any identity) compared to other core social identities relevant to politics (for exceptions, see Lee Reference Lee2009; Spry Reference Spry2017; Setzler and Yanus Reference Setzler and Yanus2018). For example, Spry (Reference Spry, Druckman and Green2021, 434) notes that typical questions in identity and politics ask “respondents to report their closeness to one group at a time, [but] not to multiple groups within the same measure,” and as a result, the conventional approach may miss “an opportunity to measure how close a respondent feels to one group category relative to other categories.”
Using ranking questions, we seek to measure and analyze the multidimensionality of people’s identities and what we call relative partisanship—the relative importance of partisan identity.Footnote
3
We obtain a representative sample of American adults through YouGov (
$N=1,082$
)Footnote
4
and ask respondents to rank four sources of their identities, including their (a) political party, (b) race, (c) gender, and (d) religion, according to their relative importance.Footnote
5
Figure 1 shows the ranking question used in the survey. In this particular example, religion is the first item on the list, followed by political party, gender, and race/ethnicity. For the reason we describe below, we randomize item order at the respondent level.

Figure 1 Ranking question to measure relative partisanship.
Our question follows a long tradition in political science of using ranking questions to study identity. For example, Miles and Rochefort (Reference Miles and Rochefort1991) test a theory of nationalism by examining whether people near the Nigeria–Niger border rank ethnic identity higher than national consciousness and religious affinity. Similarly, McCauley and Posner (Reference McCauley and Posner2019) study the relative importance of religion in identity using the Cote d’Ivore-Burkina Faso border as a quasi-natural experiment. McMurry (Reference McMurry2022) also uses a rank-order question to assess the relative importance of tribe, religion, gender, and nationality in the Philippines. Recently, Hopkins, Kaiser, and Perez (Reference Hopkins, Kaiser and Perez2023, 12) study the relative importance of partisanship among Latinos and Asian Americans compared to “religion, job/occupation, gender, family role, political party, pan-ethnic group, national origin group, and being American.” In addition, the Collaborative Multiracial Post-Election Survey (CMPS) also surveys respondents on their relative identities, such as between national origin, race, and being American (e.g., Question A195_1 in the 2016 wave).
3 Statistical Framework for Measurement Errors
In this section, we introduce a statistical framework for measurement errors in ranking questions. The framework is general and applies to multiple formats of ranking questions, including radio buttons or grid-style, drag and drop, numeric entry, and select box formats. While the framework itself has been considered in existing studies (Horowitz and Manski Reference Horowitz and Manski1995), we tailor our discussion to ranking questions and propose two design-based methods below.
3.1 Setup
Suppose there are J items that respondents rank. Here, we assume that (1) all respondents have well-defined preferences (completeness and transitivity) and (2) respondents rank all items. We also assume that each person has two potential ranking responses: non-random and random responses. First, we define non-random responses as responses based on underlying preferences or intentions. Non-random responses do not need to perfectly align with people’s underlying preferences as long as they reflect respondents’ substantial intent, whether sincere or strategic.
Next, we define random responses as meaningless responses that are independent of people’s preferences—irrelevant to/unreflective of true preferences. Our definition is general, and random responses can take many forms. For example, Figure 2a visualizes what occurs when respondents provide random response
$(1, 2, 3, 4)$
in four commonly used ranking question formats. Here, random responders (i) use the same order as a presented list (drag and drop), (ii) draw a diagonal line from the top-left to the bottom-right (radio buttons), (iii) enter in numerical ascension when asked to rank (numeric entry), or (iv) do not reorder a presented item (select box). Many other patterns are also possible. Figure 2b provides three visually intuitive examples of random responses we call diagonalizing, zigzagging, and dog-legging in the radio-button format (for a similar discussion in RCV, see Atsusaka Reference Atsusaka2025).

Figure 2 Examples of random responses in ranking questions.
Let
$Y^{*}_{i}$
and
$e_{i}$
be respondent i’s (
$i=1, \ldots ,N$
) non-random and random responses (or errors), respectively. Let
$z_{i}$
be a random variable denoting whether respondent i offers a non-random response (
$z_{i}=1$
) or otherwise (
$z_{i}=0$
). We denote respondent i’s observed response by
$Y^{\text {obs}}_{i} = Y^{*}_{i}z_{i} + e_{i}(1-z_{i})$
. We use a general notation
$g(\cdot )$
to represent a ranking-based quantity of interest (QOI).Footnote
6
Our framework represents the observed ranking data as a mixture of non-random and random responses:

where
$\Pr (z_{i}=1)$
and
$\Pr (z_{i}=0)$
are the proportions of non-random and random responses, respectively.
This illustrates why researchers cannot simply use raw data to study their variable of interest—the data contains the information on what they wish to analyze and irrelevant noises that skew their understanding of their target concept. Appendix A.1 of the Supplementary Material formally illustrates the bias from random responses and discusses why it is consequential in empirical studies. The fundamental problem of random responses is that there is no way for researchers to know which part of their data is susceptible to errors (i.e., which respondents offer random responses) and to what extent.
3.2 Quantities of Interest and Identification Problems
There are two classes of quantities that interest researchers, differing in terms of the population of interest in which the target quantity is defined. The following distinction relates to the difference between the average engaged response among the engaged and the average engaged response discussed in Tyler et al. (Reference Tyler and Westwood2024).
The first quantity is a ranking-based quantity among non-random responses:

This paper mainly studies the identification of
$\theta _{z}$
. We highlight that this estimand is only defined among people who offer non-random responses. Rearranging Equation (1), our identification problem becomes

The right-hand side contains three quantities: (1) the target quantity based on observed rankings
$g(Y^{\text {obs}}_{i})$
, (2) the proportion of random responses
$\Pr (z_{i}=0)$
, and (3) the target quantity based on random responses
$g(e_{i}|z_{i}=0)$
.
The key problem is that we only observe the first quantity
$g(Y^{\text {obs}}_{i})$
. Thus, without making any assumptions, the QOI will never be estimated from raw data alone, regardless of how many responses researchers collect. Section 4 discusses how our design-based methods allow us to point-estimate the latter two unknowns.
The second quantity of interest is a ranking-based quantity in the target population from which samples are drawn:


where,
$\theta $
becomes closer to
$\theta _{z}$
as the probability of error-free responses
$\Pr (z_{i}=1)$
increases.
Generally,
$\theta $
is more difficult to identify than
$\theta _z$
because it requires an additional assumption about the counterfactual quantity
$g(Y^{*}_{i}|z_{i}=0)$
—non-random rankings that random respondents would have provided had they not responded randomly. In Section 6, we discuss three identification strategies for this population-level quantity.Footnote
7
4 Proposed Methodology
4.1 Overview
Figure 3 summarizes our design-based methodology, which leverages two survey designs: item order randomization and an anchor question.

Figure 3 Design-based methods for estimating the proportion and distribution of random responses.
4.2 Item Order Randomization
The first design consideration is item order randomization. Many survey platforms support randomization, and some studies use item order randomization with ranking questions (Costa Reference Costa2021; Malhotra and Margalit Reference Malhotra and Margalit2014; Pradel et al. Reference Pradel, Zilinsky, Kosmidis and Theocharis2024; Rathbun, Rathbun, and Pomeroy Reference Rathbun, Rathbun and Pomeroy2022). The primary role of item order randomization is to identify the distribution of rankings among random responses and the direction of the bias with respect to our quantities of interest.
Our key theoretical result is that, under item order randomization, the rankings among random responses follow a uniform distribution with probability
$\frac {1}{J!}$
, where J is the number of items.Footnote
8
For example, with three items, random responses will correspond to one of the six profiles in the set
$\{123, 132, 213, 231, 312, 321\}$
with probability
$\frac {1}{6}$
.Footnote
9
Let
$U_{J}$
be a random variable that follows a discrete uniform distribution with J ranking items. Then, our result implies that

This result is powerful because it holds regardless of the pattern of response patterns—item order randomization transforms all random responses into a set of rankings following the uniform distribution.
Our result also clarifies why the bias still remains even after randomization. Integrating Equations 1 and 6, we can show that observed data still contain random responses under randomization as follows:

The above equation shows that, under randomization, random responses will pull any estimates towards what researchers may observe when all respondents are indifferent among available items. Thus, even under randomization, random responses still affect researchers’ substantive inferences by “diluting” their conclusions.
4.3 Anchor Questions
To identify the proportion of random responses at the time of the target ranking question,
$\text {Pr}(z_{i}=0)$
, we propose using an auxiliary ranking question whose “correct answer” is ex-ante known to researchers (for a similar idea, see Atsusaka and Stevenson (Reference Atsusaka and Stevenson2023)). We call this an anchor question and ask it right before or after the target ranking question.Footnote
10
The item order in the anchor question must be randomized just as it must be randomized for the target question. To illustrate, Figure 4 presents the anchor question we included in our survey. In this example, the question asks respondents to rank four communities from the smallest to the largest, and the correct answer is (household, neighborhood, city or town, state).Footnote
11

Figure 4 Example of an anchor ranking question.
Estimating the Proportion of Random Responses in the Anchor Question
First, we estimate the proportion of random answers in the anchor question by using correct responses. Let
$c_{i}$
be a binary variable indicating whether respondent i offers the correct answer in the anchor question
$(c_{i} = 1)$
or not
$(c_{i} = 0)$
. Let
$\Pr (z^{\text {anc}}_{i}=0)$
and
$\Pr (z^{\text {anc}}_{i}=1)$
be the proportions of random and non-random responses in the anchor question, respectively.Footnote
12
Under item order randomization, we can estimate the proportion of random responses in the anchor question using the following estimator.
Proposition 1 Unbiased Estimator of the Proportion of Random Responses in the Anchor Question


The proof is in Appendix A.3 of the Supplementary Material. Equation 9 has intuitive interpretations. The second term suggests that the proportion of non-random responses can be estimated from the proportion of correct answers
$\frac {\sum _{i=1}^{N}c_{i}}{N}$
after accounting for the probability that random responses happen to be correct (which is
$\frac {1}{J!}$
under item order randomization).Footnote
13
The third term can then be interpreted as renormalization to ensure that the resulting quantity becomes probability.
Estimating the Proportion of Random Responses in the Target Question
Next, we estimate the proportion of random responses in the target question using the above result. To allow this extrapolation, we make the following assumption.
Assumption 1 Constant Proportion of Random Responses
The proportion of random responses remains constant across the target and anchor questions or
$\Pr (z^{\text {anc}}_{i}=0) = \Pr (z_{i}=0)$
.
One key advantage of our approach is that it allows researchers to design their anchor questions so that Assumption 1 becomes more plausible—researchers can tailor an anchor question to their target question so that the two ranking questions have similar substantive topics, instruction length, the number and length of items, and locations in the survey (i.e., the anchor should come right before or after the target question). Appendix D of the Supplementary Material offers a practical guide for building anchor questions. Another advantage is that it does not assume individual-level randomness to be constant across the questions (see also Section 6.1).
4.4 Bias-Corrected Estimators
Integrating the above results, we propose two approaches to correct measurement errors. The first strategy is to directly correct the bias with a specific quantity of interest. The second approach is to apply the idea of inverse probability weighting (IPW).
4.4.1 Direct Correction
The first approach is to use the following bias-corrected estimator:

This estimator is simple and only requires one extra estimation compared to the naïve estimator
while retaining the original sample size. Our proposed estimator has a wider confidence interval than the naïve estimator due to the additional uncertainty around the estimated proportion of correct answers. We use bootstrapping for constructing confidence intervals. Moreover, researchers can include survey weights in
$\widehat {g}(Y^{\text {obs}}_{i})$
as in typical survey data analysis, where survey weights represent the product of the design weight and a poststratification or calibration adjustment.
4.4.2 Inverse Probability Weighting
The second strategy is to leverage the idea of IPW. Under this framework, the problem of measurement errors (Equation 1) can be considered an issue of selection bias. Figure 5 illustrates this idea graphically. Here, due to random responses, relatively popular rankings (e.g., 123) are under-sampled, while relatively unpopular rankings (e.g., 231) are over-sampled compared to their true distribution. A natural solution is to weight up a set of rankings that are supposed to be more prevalent and weight down a set of rankings that are supposed to be less prevalent than what raw data suggest.

Figure 5 Graphical representation of IPW.
Let
$\mathbb {P}(Y^{*}_{i}|z_{i}=1)$
be the population proportion of respondent i’s ranking profile given non-random responses. Let
$\mathbb {P}(Y^{\text {obs}}_{i})$
be the same proportion based on observed data, and
$w = \{w_{i}\}_{i=1}^{N}$
be a vector of weights for N respondents included in nonparametric or parametric analyses. We propose the following inverse probability weight:

The weights can be nonparametrically identified via the following plug-in estimator:

Let
$\mathbb {P}(U_{J})$
be the uniform distribution with probability
$\frac {1}{J!}$
. Building on a similar derivation to Equation A.7, the denominator can be unbiasedly estimated with the following estimator:

Researchers can also use survey weights in the IPW framework by constructing a new weight
$w^{*}_{i} = w_{i}w_{i}^{s},$
where
$w_{i}^{s}$
is respondent i’s survey weight.
4.4.3 Methods Selection
The two approaches complement each other. Table 1 provides a comparison. We recommend that researchers use the direct approach whenever their target quantities are simple and nonparametrically identifiable (e.g., average ranks) as it provides exact bias correction to their QOIs. In contrast, when they wish to perform more complex and parametric analyses, such as running regressions, the IPW framework can be helpful, as it allows researchers to perform any analyses with the bias-correction weights.
Table 1 Comparison of two bias correction methods.

4.5 Uniformity Test
Finally, our framework also allows researchers to detect the presence of random responses without requiring any anchor questions. Appendix B of the Supplementary Material introduces the uniformity test, which shows that recorded responses (what respondents submit in Figure 2; see Appendix A.2 of the Supplementary Material) will follow a uniform distribution in the absence of random responses. Conversely, non-uniformity in the test suggests the presence of random responses.
5 Measuring and Analyzing Relative Partisanship
Using our proposed method, we present the analysis of relative partisanship in American politics.Footnote 14 We focus on how bias-corrected estimates can differ from unadjusted estimates under different analyses, leaving more detailed analyses for future research. All analyses incorporate survey weights calculated by the polling firm.
Do our data contain random responses? To first address this question, Figure 6a shows the result of the uniformity test applied to our identity question. The figure visualizes the distribution of respondents’ recorded responses—the exact patterns they provided with respect to the four items presented in a given order. That is to say, the integers refer to submitted “patterns,” as shown in Figure 2b, whose substantive meanings differ across respondents. For example, “1234” means that respondents ranked the four items in the order they appear in the question, regardless of what items were presented to them.

Figure 6 Visualization of the uniformity test: Distribution over all possible recorded responses in the target and anchor questions.
Note: The dashed line represents 1/24
$\times $
100%, to which the distribution should converge in the absence of random responses.
Since there are 4! = 24 possible ways to rank, the proportion (percentage) of recorded responses should converge to 1/24 = 0.042 (4.2%) in the absence of random responses (see A.4 for proof). In contrast, the graph shows clear evidence for non-uniformity—some recorded responses, notably 1234 (8.8%) and 4321 (6.7%), are more likely to occur than they are supposed to under the null (chi-squared test statistic = 68.45, p-value < 0.001), suggesting the presence of random responses in the data.
Checking for uniformity also validates the usage of our anchor question. Figure 6b applies the test to the anchor question only among respondents with correct answers. The result shows a more or less uniform distribution, and the
$\chi ^2$
test does not reject the null (
$\chi ^2$
test statistic = 32.70, with p-value of 0.1066).Footnote
15
In contrast, Figure 6c visualizes the test among those who offer incorrect anchor responses. It offers clear evidence for non-uniformity, where about 20% of respondents submitted either 1234 or 4321 (
$\chi ^2$
test statistic = 107.95, p-value < 0.001).
5.1 Analysis of the Anchor Question
First, we estimate the proportion of random responses using our anchor question. Table 2 reports the number of each ranking response to the anchor question. We code 1234 (household < neighborhood < city or town < state) as the correct response (
$c_{i}=1$
) and other rankings as the incorrect response (
$c_{i}=0$
). The result shows that the empirical proportion of correct responses is
$\frac {\sum _{i=1}^{N}c_{i}}{N} = \frac {754}{1082} \approx 0.697$
. Proposition 1 states that the estimated proportion of random responses can be estimated as
$1 - [\frac {754}{1082} - \frac {1}{24}](1 - \frac {1}{24})^{-1} \approx 0.316$
. That is, we find that about 31.6% of ranking answers are random responses in the anchor question.Footnote
16
Table 2 Distribution of responses to the anchor question.

Note: 1234 corresponds to household
$\rightarrow $
neighborhood
$\rightarrow $
city or town
$\rightarrow $
state, which is coded as the correct anchor response.
By invoking Assumption 1, we estimate that 31.6% of respondents offer random responses in our target ranking question. We believe that this is not an unreasonably high (or low) estimate. For example, Berinsky, Margolis, and Sances (Reference Berinsky, Margolis and Sances2014), using four different screeners, show that the failure rates for them ranges from 34% to 41%. Relatedly, Atsusaka (Reference Atsusaka2025) finds that about 31% and 37% of respondents in survey experiments offered the same ranking response to two different ranking questions. Moreover, Clayton et al. (Reference Clayton, Horiuchi, Kaufman, King and Komisarchik2023) study measurement errors in conjoint experiments and estimate that about 19.3–27.0% of respondents offered different responses to an identical question that was asked twice.
In some cases, researchers may encounter “debatable cases,” in which multiple responses can be considered correct even after carefully designing anchor questions. For example, based on substantive knowledge, some researchers may think that for some people, the size of the community should be ordered as household < neighborhood < state < city or town. In another example, after analyzing data, analysts may find that some ranking responses are more prevalent than other incorrect answers (e.g., 4321 in Table 2). In these cases, analysts can code more than one response as correct answers. Moreover, researchers can also give credit to “partially correct answers,” if any, by coding such responses with known probability (e.g., coding an 80% correct response as correct with probability 0.8). Furthermore, it is also possible to use the most conservative (only one correct answer) and most liberal (as many correct answers as possible) coding schemes to make bounds for the resulting estimates.
Note that conservative (liberal) coding leans toward an over-estimation (under-estimation) of the prevalence of random responses in the anchor question. More importantly, the main focus should be on satisfying Assumption 1 when researchers consider different coding schemes for
$c_{i}$
. If anything, we recommend underestimating rather than overestimating the proportion of random responses in the target question because it leads to under-correction of the bias, which guards against inflating Type I errors. For example, if researchers suspect that there are more random responses in the anchor than in the main question (e.g., the anchor looks like an attention check, which caused more respondents to answer randomly), it would be better to code fewer anchor responses as correct answers should the coding is debatable.
5.2 Summarizing Data with Empirical Distributions
We begin our analysis by describing the distribution of our data. The left panel of Figure 7 presents the distribution of all possible rankings with our methods, with the gray region indicating where party was ranked first. We find that while a great variation in the ranking outcome exists, people rarely rank political party as their first choice. This may provide evidence that relative partisanship is rather low among American adults—a notable finding given the emphasis on partisan identity in American political behavior. We also identify that three rankings/orderings are particularly prevalent, including (gender, race, religion, party), (gender, race, party, religion), and (religion, gender, race, party).

Figure 7 Distributions of identity rankings with bias-corrected and raw data.
The right panel of Figure 7 visualizes what researchers could have observed had our methods not been applied (but item order randomization is still implemented). Here, many unpopular rankings (e.g., those starting with party) are overrepresented due to random responses. Indeed, the panel leads to a different conclusion that the relative importance of party is as much as that of race. Again, this demonstrates that random rankings, under item order randomization, pull the naïve estimates towards uniformity, where each ranking profile is equally prevalent.
5.3 Understanding Average Patterns
Next, we study the average ranks of the four items as another way to measure relative partisanship. Figure 8 visually compares the results based on our methods and raw data. Overall, we find that the average rank is the lowest for political party, followed by religion, race and ethnicity, and gender. Consistent with our statistical argument (Appendix A.1 of the Supplementary Material), the difference between bias-corrected and unadjusted estimates (and thus bias) is larger when the unknown target parameter is farther away from the average rank based on uniformity (in this case,
$\frac {1+4}{2} = 2.5$
).

Figure 8 Average ranks with and without bias correction
Note: The dashed line represents the average rank that arises when people are indifferent among the four items.
For party and gender, bias-corrected estimates are statistically significantly different from unadjusted estimates and closer to their bound values (1 and 4). Accordingly, the difference between religion and party is 1.52 (direct) and 1.36 (IPW) times larger in our methods than unadjusted estimates. Similarly, the difference between party and gender is 1.55 (direct) and 1.41 (IPW) times larger in our estimates than unadjusted estimates. In contrast, bias-corrected and raw-data estimates are similar for race and religion. This is consistent with our argument because while bias pulls the estimated average ranks of race and religion towards 2.5, unadjusted estimates of the two items were already close to the value. This illustrates that the magnitude of the bias and the difference between bias-corrected and unadjusted estimates varies not only by the proportion of random responses but also by the values of the target parameters. Thus, researchers should keep in mind that finding a small difference for a particular item after bias correction does not mean that the methods “failed” to address measurement errors.
Researchers can also estimate many other quantities of interest while applying bias correction. For example, our software,
rankingQ
, supports the pairwise ranking probability for items j and
$j^{\prime } \Pr (Y_{ij} < Y_{ij^{\prime }})$
, the top-k ranking probability
$\Pr (Y_{ij} \leq k)$
, and the marginal ranking probability
$\Pr (Y_{ij} = k)$
in addition to the average rank
$\mathbb {E}[Y_{ij}]$
.
5.4 Regression and Predicted Probability
Moreover, we analyze how respondent characteristics influence their relative partisanship while applying bias correction. To do so, we construct a Plackett–Luce model (also known as rank-order logistic regression) to associate people’s ranking choices with their attributes (Alvo and Yu Reference Alvo and Yu2014; Train Reference Train2003). We regress identity rankings on age, gender, race, education level, ideology, partisanship, and region, while incorporating survey and bias-correction weights via the IPW framework. After estimation, we generate the predicted probabilities that people submit a particular ranking profile with 95% confidence intervals via parametric bootstrapping (Tomz, Wittenberg, and King Reference Tomz, Wittenberg and King2003) over the range of the ideology variable (7-point scale).
Here, we examine how ideology influences relative partisanship among Americans who are 40 years old, white, male, independent, with some college education, and living in the Northeast. We examine four ranking profiles, including the three most prevalent rankings discussed in Section 5.2 and the most prevalent ranking profile starting with party.
Figure 9 presents our results. The top-left panel shows that people are more likely to choose (gender, race, party, religion) as they become more liberal, all things being equal. The first difference in predicted probabilities between the most liberal and conservative Americans with bias correction is roughly
$0.313 - 0.067 = 0.246$
, which is almost 1.6 times larger than its unadjusted counterpart of
$0.203 - 0.046 = 0.157$
. This illustrates how random responses can weaken the association between an independent variable and a target ranking profile.

Figure 9 Predicted probabilities with and without bias correction.
The top-right panel suggests that ideology only weakly relates to ranking (gender, race, religion, party). While both results show similar patterns, on average, bias-corrected predictions (0.175) are 0.077 points higher than unadjusted predictions (0.109). Thus, without correction, researchers can underestimate the prevalence of the target ranking profile by 1.61 times. Finally, the lower two panels provide examples of relatively similar bias-corrected and unadjusted predictions. Importantly, this does not mean that our methods “did not work.” Rather, it illustrates that the nature of bias depends on the target ranking profile, the target independent variable, and the reference values at which other variables are fixed, in addition to the proportion of random responses.
6 Extended Analysis
6.1 Comparison with Alternative Designs
Listwise Deletion
Some readers may wonder how our methods differ from more traditional solutions relying on attention checks, repeated questions, and so on. More specifically, how is our proposal different from listwise deletion based on these alternative design considerations?Footnote 17
Let
$z_{i}^*$
be a binary variable taking 1 if respondent i passes a certain instructional manipulation check and 0 otherwise. For example,
$z_{i}^* = 1$
when respondent i passes an attention check, provides the same answer to the same question asked multiple times, or does not speed through the target question. Researchers then drop all respondents who did not pass the test (i.e., delete all i if
$z_{i}^* = 0$
) and produce “cleaned” data of size
$N_c, \{Y_{i}^{\text {obs}}(z_{i}^* = 1)\}_{i=1}^{N_c}$
.
When adopting this strategy, researchers often implicitly assume the following.
Assumption 2 Individually Constant Randomness
Random responders in the target ranking question are identical to those who fail the instructional manipulation check. Formally,
$z_{i} = z^{*}_{i}$
for all
$i = 1, \ldots , N$
.
Invoking Assumption 2, listwise deletion identifies
$\theta _{z}$
as follows:


This way, listwise deletion along with Assumption 2 allows researchers to estimate
$\theta _z$
directly from the “cleaned” data. In other words, it is assumed that those who failed the test also provided random responses to the target ranking question. Importantly, this assumption requires that attention is stable for all respondents across the test and the target question. In this sense, Assumption 2 is much stronger than Assumption 1, which requires only the proportion of random responses to be the same.
Although the assumption is not directly verifiable, we collected auxiliary information to examine its plausibility in our survey. We find that Assumption 2 is indeed strong; as we show in Appendix C.3 of the Supplementary Material, even between two attention checks, there is very little correlation (
$\rho = 0.25$
). This is why we propose anchor questions—to approximate the randomness in the target ranking question by using a similar ranking question asked right before or after it.
Alternative Anchors
Researchers may also wish to try multiple anchor questions and study their effectiveness for pilot studies. To illustrate, we added two additional anchor questions that respectively ask respondents to (a) alphabetically order four items and (b) order them in the exact order we provide. Appendix C of the Supplementary Material reports the comparison of the empirical results based on them (along with the results using listwise deletion from attention checks and repeated questions).
As highlighted in Sections 4.4 and 5.1, researchers have full control of the choice of and the assessment of the measure to estimate
$\Pr (z = 0)$
(that is why our methodology is “design-based”). Some key takeaways are, however, that any anchor questions or instructional/factual manipulation checks need to be pretested and checked for a few sanity measures such as response time, situated adjacent to the main ranking question of interest, and preferably also a ranking format (Appendix C of the Supplementary Material).
6.2 Identification of
$\theta $
We now propose three identification strategies for
$\theta = g(Y^{*}_{i}) = g(Y^{*}_{i}|z_{i}=1)\Pr (z_{i}=1) + g(Y^{*}_{i}|z_{i}=0)\Pr (z_{i}=0)$
—the ranking-based quantity among all people in the target population. The key is to identify
$g(Y_{i}^{*}|z_{i}=0)$
, which is a function of counterfactual rankings that random respondents would have provided had they responded non-randomly.
The first approach is to assume that those who provide random responses are indifferent among available items. More specifically, we assume that the counterfactual ranking of J items is a uniformly distributed random variable
$\mathcal {U}_{J}$
.
Assumption 3 Uniform Preference
$Y^{*}_{i}(z_{i} = 0) = {U}_{J}$
.
This assumption is plausible, for example, when respondents offer random responses because they do not have sufficient information about available options. Here, randomness and preference are correlated. For example, in RCV elections, voters with low education levels may be more likely to provide random responses and have uniform preferences as they have less contextual knowledge to rank multiple candidates.
With Assumption 3, it is straightforward to compute
$g(Y^{*}_{i}|z_{i} = 1)$
using a uniform distribution and then estimate
$\theta $
accordingly. However, our design-based methods provide an even simpler solution. Using item order randomization, we can show that




In other words,
$\theta $
can be estimated directly from raw data alone.
A second approach is to assume that random respondents would have submitted similar rankings to non-random respondents. More specifically, we assume the following.
Assumption 4 Contaminated Sampling
$Y^{*}_{i} \perp z_{i}$
.
This assumption is plausible, for example, when random responses are based on simple misunderstandings, confusions, or mistakes that prevent respondents from expressing their underlying preferences. We call this assumption contaminated sampling building on Horowitz and Manski (Reference Horowitz and Manski1995). With Assumption 4, researchers can identify
$\theta $
by replacing counterfactual rankings with observed ones as follows:




Assumption 4 is violated whenever there exists a confounder that relates to both randomness and preference. Our final approach is to relax the assumption by conditioning on such a confounder. Let
$\mathbf {X}_{i}$
be a set of covariates that are related to both random responding
$z_{i}$
and preference
$Y_{i}^*$
. We assume the following.
Assumption 5 Stratified Contaminated Sampling
$Y^{*}_{i} \perp z_{i} | \mathbf {X}_{i} $
.
For simplicity, consider a single confounder. Let
$\mathbf {x}$
be a specific covariate value and
$\mathcal {X}$
be its sample space. Combined with Equation 23, we propose the following identification strategy via stratification:


In other words, we compute the weighted average of
$\theta _{z}$
in each distinct category defined by the covariate, where the weight is the proportion of each stratum
$\Pr (\mathbf {X}_{i}=\mathbf {x})$
. For example, suppose that strength in partisanship is related to both random responding and identity ranking. Then, researchers can estimate
$\theta $
by estimating
$\theta _z$
within groups of people who have reported the same partisan strength and then sum up the estimates while weighting them by the proportions of the groups.
To illustrate the three strategies, Figure 10 presents the estimates of
$\theta $
(average rank) under the three different assumptions. We use partisan strength (Independent, Weak Partisan, Strong Partisan) to illustrate the stratification approach, which yields similar estimates to the contaminated sampling approach. The uniform preference approach yields estimates closer to
$\frac {1 + 4}{2} = 2.5$
than the other two methods, consistently with its assumption. This way, researchers can extend their inference to
$\theta $
by leveraging their substantive knowledge about why random responses may occur in their specific application.

Figure 10 Average ranks in the entire population under different assumptions.
Note: The dashed line represents the average rank that arises when people are indifferent among the four items.
7 Concluding Remarks
We introduced a statistical framework to quantify and address measurement errors in ranking survey questions due to random responses. We show that two additional survey designs—item order randomization and a paired anchor ranking question—will help us learn about the direction and magnitude of measurement errors, enabling our bias corrections. Without any corrections, substantial conclusions can be biased in completely unpredictable directions. Even with the current best practice of item order randomization, random responses may conceal otherwise interesting patterns in ranking data. More specifically, we illustrated that measurement errors make the distribution of observed rankings closer to a uniform distribution under randomization, still affecting our inferences.
Using a motivating application that measures relative identities, we show that more than 30% of respondents can fall prey to random responses, and not accounting for this can affect our substantive conclusions. We also show that our methods are valid by showing that recorded responses among respondents who pass the anchor question are close to a uniform distribution, while those who do not show a wildly non-uniform pattern. Our framework provides a heightened understanding of why observed ranking data may be contaminated and what information we require to correct the resulting bias.
Although our current framework focuses on full-ranking questions, it can be extended in several ways for future studies. For example, future research may study how our theoretical results change in more complicated situations that allow partial rankings, top-k rankings, and tie rankings. Moreover, our methods can be extended to other discrete-choice questions, such as binary, multinomial, and ordered-choice questions. In fact, many of our methods, including the uniformity test, randomization, and anchor questions, can be readily applicable to many discrete-choice questions, although such applications may involve unique challenges (e.g., the inability to randomize option order in ordered-response questions). With these future directions, this work contributes to a growing body of design-based methodologies to counter measurement errors in survey research. We hope this work is also informative to election administrators and election science scholars as the number of jurisdictions considering RCV increases.
Acknowledgments
This study was approved by the Institutional Review Board (IRB) at American University (Study ID: IRB-2023-189). We thank Jim Bisbee, Bernie Grofman, Diana Da In Lee, Ryan Moore, Carlisle Rainey, Matt Tyler, and four anonymous reviewers for their feedback. We also thank the attendees of the EPOVB Conference 2023, Japanese Society for Quantitative Political Science Winter Meeting 2024, PolMeth XL, KAIS 2023 Annual Meeting, Seoul National University’s Statistics Department workshop, and Harvard University’s applied statistics workshop. Our accompanying software rankingQ based on R is available at https://github.com/sysilviakim/rankingQ.
Author Contributions
Both authors contributed equally and are listed in alphabetical order.
Data Availability Statement
Replication materials can be found on the Political Analysis Harvard Dataverse for Atsusaka and Kim (Reference Atsusaka and Kim2024) at https://doi.org/10.7910/DVN/UCTXEF. A copy of the same code and data can also be accessed at https://github.com/sysilviakim/ranking_error.
Supplementary Material
For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2024.24.