Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-01-07T18:36:55.838Z Has data issue: false hasContentIssue false

Analysis of the Weighted Kappa and Its Maximum with Markov Moves

Published online by Cambridge University Press:  01 January 2025

Fabio Rapallo*
Affiliation:
University Of Genova
*
Correspondence should be made to Fabio Rapallo, Department of Economics, University of Genova, Via Francesco Vivaldi 5, 16126Genoa, Italy. Email: [email protected]
Rights & Permissions [Opens in a new window]

Abstract

In this paper, the notion of Markov move from algebraic statistics is used to analyze the weighted kappa indices in rater agreement problems. In particular, the problem of the maximum kappa and its dependence on the choice of the weighting schemes are discussed. The Markov moves are also used in a simulated annealing algorithm to actually find the configuration of maximum agreement.

Type
Theory and Methods
Creative Commons
Creative Common License - CCCreative Common License - BY
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Copyright
Copyright © 2022 The Author(s)

The analysis of rater agreement is currently one among the most active and relevant research areas in categorical data analysis. Even in the simplest case where two or more observers rate a common set of n objects on the same rating scale, there are in literature several indices to summarize the agreement, each of them with its own paradoxes, counterexamples, and unexpected behaviors. Indeed, the large spectrum of possible indices is the symptom of the difficulties in the interpretation of the results. For a general survey, the reader can refer to Fleiss et al. (Reference Fleiss, Levin and Cho Paik2003), von Eye and Mun (Reference von Eye and Mun2004), or Shoukri (Reference Shoukri2010).

The most popular measures of agreement, at least in the two-rater case, are the Cohen’s κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} and the weighted Cohen’s κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} . First introduced in Cohen (Reference Cohen1960) and Cohen (Reference Cohen1968), respectively, such two indices have been analyzed, criticized, generalized, in order to adapt to the multi-rater case, to incomplete rating schemes, and so on. For instance, in the multi-rater case the most popular extension of the Cohen’s κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} is the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} introduced in Conger (Reference Conger1980), where the pairwise agreement in all possible two-way marginal tables is considered, see the discussion and the examples in Vanbelle (Reference Vanbelle2019). In all cases, the rationale behind such indices is the measurement of the rater agreement beyond chance, in the sense that under complete independence of the raters the value of the indices should be zero. In this paper, we restrict our attention to the weighted Cohen’s κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} and its extensions to the multi-rater case, paying special attention to the connections between the choice of the weighting scheme and the maximum attainable value of such indices.

A first issue of the kappa-type statistics we consider in this paper is normalization. It is known that the interpretation of kappa-type statistics is not straightforward since their maximum is 1 only when the marginal distributions are homogeneous. In the case of non-homogeneous margins, the maximum value can be considerably less than 1. As customary in statistics when working with indices, a problem is therefore to compute the maximum value attainable by an index in order to compare the observed value with the maximum. Some attempts has been made in the direction of finding the maximum value of the kappa statistics. For instance, a procedure has been introduced in Umesh et al. (Reference Umesh, Peterson and Sauber1989), where the maximum agreement is found by fixing the observed agreement and by varying the marginal distributions. When working with fixed margins, however, the computation of the maximum attainable kappa is relatively easy only in the case of the unweighted κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} in two-rater setting, see, e.g., Sim and Wright (Reference Sim and Wright2005). The problem is less simple in the weighted case or in the multi-rater setting. We will illustrate this point extensively.

A second issue this paper deals with is the dependence of the kappa-type statistics on the choice of the weights. The unweighted version of the Cohen’s κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} only distinguishes between agreement cells and disagreement cells, and thus, it is used in case of ratings on a nominal scale. When the rating scale is ordinal, or in general when there are some disagreements to be considered more serious than others, then the weighted κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} should be preferred. In this case, the choice of the weights is a delicate issue, and it is known that different weighting schemes lead to quite different results. In addition, the main weighting schemes (i.e., linear or quadratic) have been studied extensively, but there is a need of further analyses to better understand the role played by the weights in the behavior of kappa-type statistics. In a recent paper by Kvålseth Kvålseth (Reference Kvålseth2018), the dependence of the weighted κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} on the choice of the weights is highlighted, and the relevance of the interpretation of the κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} values as functions of the weighting schemes is discussed extensively. The author motivates its study on the properties of the weights claiming that, without a clear understanding of the connections between the weights and the κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} , the weighted κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} itself is not a satisfactory index to describe the agreement in an ordinal context. Thus, also this problem will be considered here.

Both the points described above are analyzed in this paper with the aid of algebraic statistics. We give insights and new results to provide a precise understanding of the action of weights in the computation of the kappa, and this will help for a correct interpretation of weighted kappa statistics. This analysis is carried out here by means of the Markov moves, well-known tools in algebraic statistics for the analysis of contingency tables. In particular, we show how the properties of the weights affect the configuration of maximum agreement. Our results allow a precise understanding of the role of the weights and their impact on the structure of the configuration with maximum agreement. The use of algebraic statistics for rater agreement analysis has been considered in other works, but mainly for computational purposes. For instance, the use of Markov bases to make exact tests in this framework can be found in Rapallo (Reference Rapallo2003) and Rapallo (Reference Rapallo2005).

Without introducing here formal definitions, for which the reader can refer to the next section, let us present a simple example, taken from von Eye and Mun (Reference von Eye and Mun2004), page 74. Two psychiatrists P 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$P_1$$\end{document} and P 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$P_2$$\end{document} rate the severity of depression of 129 patients using a three-level ordinal scale. The observed data are in Fig. 1 (left). A table with the same margins and maximum agreement with linear weights is in Fig. 1 (center), while the table with the same margins and maximum agreement with quadratic weights is in Fig. 1 (right).

Figure 1. Two psychiatrists’ rating of severity of depression. The observed table (left), a table with the same margins and maximum agreement with linear weights (center), and the table with the same margins and maximum agreement with quadratic weights (right).

Already from this simple example, we can highlight some counterintuitive facts. For instance, not always the maximum agreement is obtained by maximizing the counts on the diagonal. With quadratic weights the maximum is not attained in a table which fulfills the main diagonal as much as possible. We will show that to fulfill the diagonal is a good strategy to increase the agreement only if the weights define a distance on the ground set, and this is not the case, for instance, when quadratic weights are used. On that point, the available resources to compute the maximum agreement fail. The R package rel, LoMartire (Reference LoMartire2020) and, for the two-rater setting, some online calculators, e.g., in Lowry (Reference Lowry2020) at the date of submission, do not give the correct answer.

Another interesting point concerning the above example is that with linear weights there are several configurations with the same value of kappa. The tables in Fig. 1 (center and right) share the same value of weighted kappa with linear weights (0.6089), although they appear rather different at a first sight, and the weighted kappa with quadratic weights ranges from 0.6007 to 0.6909. We will see that in this simple example all tables with the maximum agreement under linear weights can be obtained in an easy way, as only one Markov move can be applied.

Finally, we exploit again the Markov moves, and we introduce a simple simulated annealing algorithm to find the maximum agreement. In particular, we assume the marginal distributions as fixed and we consider all multivariate tables with fixed one-way margins. The proposed algorithm can be applied with a general weighting scheme, not limited to linear or quadratic. Notice that the problem can be tackled also within the theory of integer linear programming (maximize the kappa statistics taking fixed the margins), but Markov moves provide a flexible tool and a solution easy to explain.

The paper is organized as follows. In Sect. 1 we recall the notation and the basic definitions about the Cohen’s κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} , the weighted Cohen’s κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} , and the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} and κ C , w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C,w}$$\end{document} for the multi-rater case. In Sect. 2, we compute the Markov bases for the rater agreement problem in the two-rater and in the multi-rater cases. Such Markov bases are used in Sect. 3 to state some results on the structure of the configuration of maximum agreement in connection with the (metric) properties of the weighting schemes. Section 4 is devoted to the illustration of a simulated annealing algorithm to actually find the configuration of maximum agreement, while in Sect. 5 the results of a simulation study are presented and discussed. Finally, Sect. 6 contains some concluding remarks and pointers to future directions.

1. Notation and Basic Recalls

In this section, we briefly review the basic definitions about the kappa-type indices of agreement which will be used in the paper. We first focus on the two-rater setting.

Let us consider the ratings of the two raters as a pair of random variables X and Y on the set { 1 , , k } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{1, \ldots , k\}$$\end{document} , or more generally on a finite ground set { x 1 , , x k } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{x_1, \ldots , x_k\}$$\end{document} . Let us denote with p ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_{ij}$$\end{document} the probability of the cell (i, j), and with p i + \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_{i+}$$\end{document} ( i = 1 , , k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i=1, \ldots , k$$\end{document} ) and p + j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_{+j}$$\end{document} ( j = 1 , , k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j=1, \ldots , k$$\end{document} ) the marginal distributions of X and Y, respectively. The Cohen’s κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} is defined as:

(1) κ = i = 1 k p ii - i = 1 k p i + p + i 1 - i = 1 k p i + p + i = 1 - ( i , j ) D p ij ( i , j ) D p i + p + j , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \kappa = \frac{\sum _{i=1}^k p_{ii} - \sum _{i=1}^k p_{i+}p_{+i}}{1 - \sum _{i=1}^k p_{i+}p_{+i}} = 1 - \frac{\sum _{(i,j) \in D} p_{ij}}{\sum _{(i,j) \in D} p_{i+}p_{+j}} \, , \end{aligned}$$\end{document}

where D = { ( i , j ) : i j } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$D=\{(i,j) \ : \ i \ne j\}$$\end{document} is the set of the disagreement cells.

Given a matrix of weights of agreement W = ( w ij ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$W=(w_{ij})$$\end{document} with 0 w ij < 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$0 \le w_{ij} < 1$$\end{document} for all i, j with i j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} , and w ii = 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$w_{ii}=1$$\end{document} for all i, the weighted kappa is:

(2) κ w = i , j = 1 k w ij p ij - i , j = 1 k w ij p i + p + j 1 - i , j = 1 k w ij p i + p + j = 1 - ( i , j ) D u ij p ij ( i , j ) D u i , j p i + p + j , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \kappa _w = \frac{\sum _{i,j=1}^k w_{ij}p_{ij} - \sum _{i,j=1}^k w_{ij}p_{i+}p_{+j}}{1 - \sum _{i,j=1}^k w_{ij}p_{i+}p_{+j}} = 1 - \frac{\sum _{(i,j) \in D} u_{ij} p_{ij}}{\sum _{(i,j) \in D} u_{i,j}p_{i+}p_{+j}} \, , \end{aligned}$$\end{document}

where in the second expression u ij = 1 - w ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{ij}=1-w_{ij}$$\end{document} . Although not strictly necessary for the theory of rater agreement, we suppose that the matrices W and U = ( u ij ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$U=(u_{ij})$$\end{document} are symmetric, because some of our results are based on the properties of the metric functions, where symmetry is one the axioms. In the previous formulas, the u ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{ij}$$\end{document} are weights of disagreement, and it is easily seen that u ij = 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{ij}=0$$\end{document} on the main diagonal and 0 < u ij 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$0<u_{ij}\le 1$$\end{document} for i j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} . When a sample is available, the indices κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} and κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} are estimated by replacing in Eqs. (1) and (2) the theoretical probabilities with the corresponding sample proportions. On a sample of size N, we denote with n ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{ij}$$\end{document} the count of the cell (i, j) and therefore sample proportion is p ^ ij = n ij / N \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${\hat{p}}_{ij}=n_{ij}/N$$\end{document} .

Among the most commonly used weighting schemes there are:

  1. (a) the quadratic weights (see Fleiss and Cohen Reference Fleiss and Cohen1973):

    (3) u ij = ( i - j ) 2 ( k - 1 ) 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} u_{ij} = \frac{(i-j)^2}{(k-1)^2} \end{aligned}$$\end{document}
  2. (b) the linear weights (see Cicchetti and Allison Reference Cicchetti and Allison1971):

    (4) u ij = | i - j | k - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} u_{ij} = \frac{|i-j|}{k-1} \end{aligned}$$\end{document}

Moreover, the unweighted κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} in Eq. (1) can be considered as a special case of the weighted κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} by setting

(5) u ij = 0 for i = j 1 otherwise \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} u_{ij} = \left\{ \begin{array}{ll}0 \ &{} \ \text{ for } i=j \\ 1 \ &{} \ \text{ otherwise } \end{array}\right. \end{aligned}$$\end{document}

Recent discussions on the choice, use, and interpretation of the different weighting schemes can be found in Warrens (Reference Warrens2013) and Kvålseth (Reference Kvålseth2018). On one side, the main reasons in favor of the quadratic and linear weights are essentially of theoretical nature. In fact, the quadratic weights lead to the interpretation of the weighted kappa as the intraclass correlation coefficient, see Schuster (Reference Schuster2004). On the other side, the linear weights allow us to define the weighted kappa as a weighted average of kappas for the 2 × 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$2 \times 2$$\end{document} tables obtained by collapsing adjacent categories, see Vanbelle and Albert (Reference Vanbelle and Albert2009). However, undesirable behaviors of the weighted kappa for some data set can be observed under both choices of the weights, and thus the interpretation of the value of kappa is not easy in general. We will come back to this issue later in the paper, when we will use Markov moves to find the maximum agreement. Another interesting interpretation of the linear and quadratic weights is discussed in Li (Reference Li2016), where matrix W is decomposed into a sum of suitable rank one matrices.

In order to illustrate our theory, we also consider a square-root version of the weights, namely:

(6) u ij = | i - j | k - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} u_{ij} = \frac{\sqrt{|i-j|}}{\sqrt{k-1}} \end{aligned}$$\end{document}

As a preliminary remark, notice that the linear weights in Eq. (4) and the square-root weights in Eq. (6) define a distance in R \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathbb {R}}}$$\end{document} , while the quadratic weights in Eq. (3) do not, because the triangular inequality is not satisfied. Usually, functions like the quadratic weights are called dissimilarities. In this paper, when the matrix U is a distance matrix, we name the weights as “distance weights,” and in particular, we refer to the weights in Eq. (6) as to the sqrt weights. Moreover, we use the notation κ q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _q$$\end{document} , κ l \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _l$$\end{document} , κ s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _s$$\end{document} when quadratic, linear, or sqrt weights are used, while we denote with κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} the kappa with a general weight.

Observe that the distance defined by the linear weights is the usual Euclidean distance in R \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathbb {R}}}$$\end{document} , and it has a special behavior in terms of the triangular inequality. In fact, for i < j < h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i<j<h$$\end{document} the triangular inequality becomes an equality: u ih = u ij + u jh \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$u_{ih}=u_{ij}+u_{jh}$$\end{document} . We will exploit this property later in the paper.

In the multi-rater setting, we consider the ratings of r raters as r random variables X 1 , , X r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_1, \ldots , X_r$$\end{document} on the same set { 1 , , k } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{1, \ldots , k\}$$\end{document} , or more generally on { x 1 , , x k } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{x_1, \ldots , x_k\}$$\end{document} . The observed data form a k r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k^r$$\end{document} table. We denote with p i 1 i r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_{i_1 \ldots i_r}$$\end{document} the probability of the cell ( i 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots , i_r)$$\end{document} , and with n i 1 i r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{i_1 \ldots i_r}$$\end{document} the corresponding observed count on a sample of size N. Moreover, we denote with p ( u ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p^{(u)}$$\end{document} the one-dimensional marginal distribution of X u \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_u$$\end{document} , and with p ( u v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p^{(uv)}$$\end{document} the two-dimensional marginal distribution of the pair ( X u , X v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(X_u,X_v)$$\end{document} .

To measure the agreement in the multi-rater setting, it is customary to use the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} , originally introduced in Conger (Reference Conger1980) and the re-analyzed in several papers, see, e.g., Vanbelle (Reference Vanbelle2019). The Conger’s κ c \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _c$$\end{document} is based on a pairwise rater agreement analysis. It is defined as:

(7) κ C = p o - p e 1 - p e \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \kappa _C = \frac{p_o - p_e}{1 - p_e} \end{aligned}$$\end{document}

where p o \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_o$$\end{document} is the mean proportion of agreement between all r ( r - 1 ) / 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r(r-1)/2$$\end{document} pairs of raters, and similarly p e \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$p_e$$\end{document} is the mean proportion of expected agreement between all r ( r - 1 ) / 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r(r-1)/2$$\end{document} pairs of raters under independence. In formulas,

(8) p o = 2 r ( r - 1 ) u , v { 1 , , r } , u < v i = 1 k p ii ( u v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} p_o = \frac{2}{r(r-1)} \sum _{u,v \in \{1, \ldots , r\},u<v} \sum _{i=1}^k p^{(uv)}_{ii} \end{aligned}$$\end{document}

and

(9) p e = 2 r ( r - 1 ) u , v { 1 , , r } , u < v i = 1 k p i ( u ) p i ( v ) . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} p_e = \frac{2}{r(r-1)} \sum _{u,v \in \{1, \ldots , r\},u<v} \sum _{i=1}^k p^{(u)}_{i}p^{(v)}_{i} \, . \end{aligned}$$\end{document}

Since the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} is based on the two-way margins of the k r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k^r$$\end{document} table, it is easy to define a weighted version of the Conger’s kappa as follows:

(10) κ C , w = p o , w - p e , w 1 - p e , w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \kappa _{C,w} = \frac{p_{o,w} - p_{e,w}}{1 - p_{e,w}} \end{aligned}$$\end{document}

with

(11) p o , w = 2 r ( r - 1 ) u , v { 1 , , r } , u < v i , j = 1 k w ij p ij ( u v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} p_{o,w} = \frac{2}{r(r-1)} \sum _{u,v \in \{1, \ldots , r\},u<v} \sum _{i,j=1}^k w_{ij}p^{(uv)}_{ij} \end{aligned}$$\end{document}

and

(12) p e , w = 2 r ( r - 1 ) u , v { 1 , , r } , u < v i , j = 1 k w ij p i ( u ) p j ( v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} p_{e,w} = \frac{2}{r(r-1)} \sum _{u,v \in \{1, \ldots , r\},u<v} \sum _{i,j=1}^k w_{ij}p^{(u)}_{i}p^{(v)}_{j} \end{aligned}$$\end{document}

In the above definition, the weights are the same for all pairs u, v of raters, but the definition can be easily extended to the case of different weights on different two-way margins. Also notice that the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} can be defined in the general case of g-wise agreement, as in the original paper Conger (Reference Conger1980). This is done by taking the κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C}$$\end{document} unchanged in Eq. (7), and computing the observed agreement and the expected agreement in Eqs. (8) and (9) on the g-way marginal tables instead of the two-way tables. However, when the weighted version κ C , w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C,w}$$\end{document} in Eqs. (10)–(12) is considered, the pairwise agreement is the most reasonable choice, and the extension to the g-wise agreement would require new definitions of the weighting schemes.

2. Markov Bases

In this section, we introduce the main tools from algebraic statistics needed in our framework. In particular, we define the notion of Markov basis and we compute the relevant Markov bases for the rater agreement problems.

Let n be an observed contingency table, possibly multi-way. An integer-valued statistic is a function T : N k r N s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T:{{\mathbb {N}}}^{k^r} \longrightarrow {{\mathbb {N}}}^s$$\end{document} . Since we need to compute the maximum agreement with fixed marginal distributions we are particularly interested in the function

(13) T : n ( ( n i + ) i = 1 , , k , ( n + j ) j = 1 , , k ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} T : n \longmapsto ((n_{i+})_{i=1,\ldots ,k},(n_{+j})_{j=1,\ldots ,k}) \end{aligned}$$\end{document}

in the two-way case and

(14) T : n ( ( n i ( 1 ) ) i = 1 , , k , , ( n i ( r ) ) i = 1 , , k ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} T : n \longmapsto ((n^{(1)}_i)_{i=1,\ldots ,k}, \ldots , (n^{(r)}_i)_{i=1,\ldots ,k}) \end{aligned}$$\end{document}

in the general multi-rater case, where n i ( s ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n^{(s)}_i$$\end{document} is the i-th entry of the marginal distribution of the s-th rater.

Definition 1.

Given a statistic T, the fiber (or reference set) of a contingency table n is the set

(15) F T ( n ) = { n N k r | T ( n ) = T ( n ) } . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {\mathcal {F}}_T(n) = \{ n' \in {{\mathbb {N}}}^{k^r} \ | \ T(n') = T(n) \} \, . \end{aligned}$$\end{document}

Definition 2

A Markov move for the statistic T is an integer-valued table m such that T ( m ) = 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$T(m)=0$$\end{document} .

Definition 3

A Markov basis for the fiber F T ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {F}}}_T(n)$$\end{document} of a table n is a set of Markov moves

M n , T = { m ( 1 ) , , m ( ) } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} {{\mathcal {M}}}_{n,T} = \{m^{(1)}, \ldots , m^{(\ell )} \} \end{aligned}$$\end{document}

such that for each pair of tables n , n F T ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n',n'' \in {{\mathcal {F}}}_T(n)$$\end{document} there exists a sequence of moves ( m ( i 1 ) , , m ( i Q ) ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(m^{(i_1)}, \ldots , m^{(i_Q)})$$\end{document} such that

  1. 1. n = n + j = 1 Q m ( i j ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'' = n' + \sum _{j=1}^Q m^{(i_j)}$$\end{document}

  2. 2. n + j = 1 q m ( i j ) 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n' + \sum _{j=1}^q m^{(i_j)} \ge 0$$\end{document} for all q = 1 , , Q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$q=1, \ldots ,Q$$\end{document} .

In words, a Markov basis is a set of moves which makes the fiber F T ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {F}}}_T(n)$$\end{document} connected and all intermediate steps are nonnegative. In algebraic statistics, this is the main tool to define a Metropolis-like Markov chain Monte Carlo algorithm for doing exact inference for contingency tables. For a comprehensive introduction to Markov bases and their use in statistics, the reader can refer to the books Sullivant (Reference Sullivant2018) and Aoki et al. (Reference Aoki, Hara and Takemura2012).

Since we are particularly interested in the computation of the maximum agreement given the marginal distributions, we need the Markov bases for the statistic T in Eqs. (13) and (14).

Following Diaconis and Sturmfels (Reference Diaconis and Sturmfels1998), in the general case the computation of a Markov basis needs symbolic computation and is actually not feasible for large-sized tables. However, the Markov bases for the fibers considered in this paper can be theoretically characterized and therefore no symbolic computation is involved. For an overview on the computation of Markov bases through symbolic software, the underlying computational problems, and the actual limitations for large tables, the reader can refer to Aoki et al. (Reference Aoki, Hara and Takemura2012).

As a first step, we recall a result from Diaconis and Sturmfels (Reference Diaconis and Sturmfels1998) about the Markov basis for two-way tables with fixed margins.

Definition 4

Let i , i \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i,i'$$\end{document} be two distinct row indices and j , j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j,j'$$\end{document} be two distinct column indices. A basic move is a move m such that

m ij = m i j = + 1 , m i j = m i j = - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} m_{ij}=m_{i'j'}=+1, \qquad m_{ij'}=m_{i'j}=-1 \end{aligned}$$\end{document}

and is 0 otherwise.

Some examples of basic moves in the case of 4 categories are given in Fig. 2. Such moves have different behavior in terms of agreement. We will discuss all these types of moves in the next section.

Figure 2. Four basic moves for the two-rater problem. a Two nonzero elements on the diagonal; b one nonzero element on the diagonal, the move lies on the upper triangle; c one nonzero element on the diagonal, the move lies on both the upper and the lower triangle; d no nonzero elements on the diagonal.

Proposition 1.

The set of basic moves in Definition 4 is a Markov basis for the fiber in Eq. (15) for the two-rater problem.

The basic moves in the multi-rater setting are defined by extending the previous definition to more than two dimensions. Informally, one takes two + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} ’s in two cells with at least two distinct coordinates and then arranges the - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} ’s in order to have the correct projections in all two-way margins. More formally, we can state the following definition.

Definition 5

A basic move m for the multi-rater problem is a table with 4 nonzero entries:

  • m is equal to + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} in ( i 1 , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots i_r)$$\end{document} and in ( i 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_1,\ldots , i'_r)$$\end{document} with at least two different indices. Without loss of generality, suppose that the distinct indices are i 1 , , i q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1, \ldots , i_q$$\end{document} , q 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$q \ge 2$$\end{document} ;

  • m is equal to - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} in ( j 1 , , j q , i q + 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j_1,\ldots , j_q,i_{q+1},\ldots , i_r)$$\end{document} and in ( j 1 , , j q , i q + 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j'_1,\ldots , j'_q,i_{q+1},\ldots , i_r)$$\end{document} with

    1. (i) j s = i s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j_s=i_s$$\end{document} , j s = i s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j'_s=i'_s$$\end{document} for s S \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s \in {{\mathcal {S}}}$$\end{document}

    2. (ii) j s = i s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j_s=i'_s$$\end{document} , j s = i s \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j'_s=i_s$$\end{document} for s S \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$s \notin {{\mathcal {S}}}$$\end{document}

    where S \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {S}}}$$\end{document} is a non-empty subset of { 1 , , q } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{1, \ldots , q\}$$\end{document} .

It is easy to see that this definition reduces to Definition 4 when r = 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r=2$$\end{document} . Two examples of basic moves in the 3 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$3^3$$\end{document} case are illustrated in Fig. 3.

Figure 3. Two basic moves for the three-rater problem. A move of type (a) and a move of type (b) from Proposition 2.

The fact that basic moves are enough to connect the fiber in Eq. (15) can be derived from the theory of toric fiber products to be found in Sullivant (Reference Sullivant2007). This allows us to avoid symbolic computations and to make available the relevant Markov bases also for large-sized tables.

Proposition 2.

The set of basic moves in Definition 5 is a Markov basis for the fiber in Eq. (15) when r > 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r>2$$\end{document} .

Proof.

First, note that all basic moves in Definition 5 are in the kernel of the marginalization map T.

Since for r = 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r=2$$\end{document} the basic moves in Definition 5 coincide with the basic moves in 4, the result is true when r = 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r=2$$\end{document} . We proceed by induction on r. Let us suppose that the result holds for r - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$r-1$$\end{document} raters, and we prove it for r raters. We apply Theorem 13 in Sullivant (Reference Sullivant2007). Given a Markov basis M r - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {M}}}_{r-1}$$\end{document} for the problem with ( r - 1 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(r-1)$$\end{document} raters, a Markov basis M r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {M}}}_{r}$$\end{document} for the problem with r raters is the union of the following sets of moves

  1. (a) split each move of M r - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {M}}}_{r-1}$$\end{document} by putting one + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} and one - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} at a given level h of X r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_r$$\end{document} ( h = 1 , , k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$h=1, \ldots , k$$\end{document} ) and the other + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} and - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} at a level h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$h'$$\end{document} of X r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_r$$\end{document} ( h = h , , k \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$h'=h, \ldots , k$$\end{document} );

  2. (b) for any two distinct cells ( i 1 , , i r - 1 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots , i_{r-1})$$\end{document} and ( i 1 , , i r - 1 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_1, \ldots , i'_{r-1})$$\end{document} on the ( r - 1 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(r-1)$$\end{document} -dimensional table, and for any two distinct levels h , h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$h,h'$$\end{document} of X r \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$X_r$$\end{document} , take the move with + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} in ( i 1 , , i r - 1 , h ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots , i_{r-1},h)$$\end{document} and in ( i 1 , , i r - 1 , h ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_1, \ldots , i'_{r-1},h')$$\end{document} and with - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} in ( i 1 , , i r - 1 , h ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots , i_{r-1},h')$$\end{document} and in ( i 1 , , i r - 1 , h ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_1, \ldots , i'_{r-1},h)$$\end{document} .

Since all the moves defined in items (a) and (b) above are basic moves, the result is proved. \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\square $$\end{document}

As noticed in Introduction, Markov bases in algebraic statistics are usually defined in algebraic statistics in order to perform exact tests with a Metropolis–Hastings algorithm, and therefore to generate all contingency tables with the same value of the sufficient statistics as the observed table. Here, we simply use Markov bases to compute all the tables with fixed margins.

3. The Effect of the Markov Moves on the Kappa Indices

Now, we use the basic moves of the Markov bases in order to better understand the meaning of the weighted kappa. The basic idea is to apply the definition of Markov basis to analyze the rater agreement in connection with the weighting schemes. We show here that most of the basic moves have a precise behavior in terms of their effect on the kappa indices and therefore we analyze how the rater agreement changes when a Markov move is applied. Moreover, the configuration of maximum agreement can be reached with a finite number of Markov moves, starting from the observed table, and the analysis with basic moves helps us in understanding the structure of the configurations with maximum agreement.

Since our analysis is performed with fixed marginal distributions, the kappa indices are monotonic with the observed agreement. So, to ease the formulas, we consider the quantities

(16) A o , w ( n ) = 1 N i , j = 1 k w ij n ij \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n) = \frac{1}{N} \sum _{i,j=1}^k w_{ij}n_{ij} \end{aligned}$$\end{document}

in the two-rater setting, and

(17) A o , w ( n ) = 2 r ( r - 1 ) u , v { 1 , , r } , u < v 1 N i , j = 1 k w ij n ij ( u v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n) = \frac{2}{r(r-1)} \sum _{u,v \in \{1, \ldots , r\},u<v} \frac{1}{N} \sum _{i,j=1}^k w_{ij}n^{(uv)}_{ij} \end{aligned}$$\end{document}

in the multi-rater setting.

Let us start with some results in the two-rater setting.

Lemma 1

Let n be an observed agreement table, let i j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} be two indices, and let m be the basic move with

m ii = m jj = + 1 , m ij = m ji = - 1 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} m_{ii}=m_{jj}=+1, \qquad m_{ij}=m_{ji}=-1 \, . \end{aligned}$$\end{document}

If n ij > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{ij}>0$$\end{document} and n ji > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{ji}>0$$\end{document} , then

(18) A o , w ( n + m ) A o , w ( n ) . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n+m) \ge A_{o,w}(n) \, . \end{aligned}$$\end{document}

Proof.

From Eq. (16) and using the disagreement weights, we get

A o , w ( n + m ) - A o , w ( n ) = A o , w ( m ) = - 1 N i , j = 1 k u ij m ij = 2 N u ij 0 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n+m) - A_{o,w}(n) = A_{o,w}(m) = - \frac{1}{N} \sum _{i,j=1}^k u_{ij}m_{ij} = \frac{2}{N} u_{ij} \ge 0 \, . \end{aligned}$$\end{document}

\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\square $$\end{document}

Lemma 1 is valid for all weighting schemes and tells us that if there are positive counts in symmetric cells, then it is always possible to construct an observed table with higher observed agreement by applying a simple move. This is quite intuitive, since Eq. (18) roughly says that moving counts on the diagonal increases the observed agreement.

Nonetheless, apart from the symmetric basic moves as displayed in Fig. 2a, for the other types of basic moves there is not a common behavior in terms of observed agreement. Remember that, earlier in the paper, we have noticed that some weighting schemes defines a distance on the ground set { x 1 , , x k } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\{x_1, \ldots , x_k\}$$\end{document} while other schemes do not. The following proposition states a partial result when only one cell of the diagonal is involved in the basic move.

Proposition 3.

Let n be an observed agreement table, let i < j < h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i< j < h$$\end{document} be three indices, and let m be the basic move with

m ih = m jj = + 1 , m ij = m jh = - 1 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} m_{ih}=m_{jj}=+1, \qquad m_{ij}=m_{jh}=-1 \, . \end{aligned}$$\end{document}

. If n ij > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{ij}>0$$\end{document} and n jh > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{jh}>0$$\end{document} and a distance weighting scheme is used, then

A o , w ( n + m ) A o , w ( n ) . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n+m) \ge A_{o,w}(n) \, . \end{aligned}$$\end{document}

The same holds if i > j > h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i>j>h$$\end{document} .

Proof.

Let us consider the case i < j < h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i< j < h$$\end{document} . (The other case has a similar proof.)

From Eq. (16) and using the disagreement weights, we get

A o , w ( n + m ) - A o , w ( n ) = A o , w ( m ) = - 1 N i , j = 1 k u ij m ij = = 1 N u ij + u jh - u ih 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n+m) - A_{o,w}(n)= & {} A_{o,w}(m) = - \frac{1}{N} \sum _{i,j=1}^k u_{ij}m_{ij} = \\= & {} \frac{1}{N} \left( u_{ij} + u_{jh}- u_{ih} \right) \ge 0 \end{aligned}$$\end{document}

by virtue of the triangular inequality. \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\square $$\end{document}

In Proposition 3, the move m has one nonzero element on the diagonal. In the case i < j < h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i< j < h$$\end{document} the move lies in the upper triangle of the table, while in the case i > j > h \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i>j>h$$\end{document} lies in the lower one.

Some remarks are now in order. First, note that in Proposition 3 the assumption of distance weights is essential. For weighting schemes derived or not derived from a distance we observe opposite behaviors of the weighted kappa. For instance, let us consider the observed table below:

n = 4 0 0 0 0 4 1 0 0 0 4 1 0 0 0 4 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} n=\begin{pmatrix} 4 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 4 &{}\quad 1 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 4 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 4 \end{pmatrix} \, . \end{aligned}$$\end{document}

We can apply the move

m = 0 0 0 0 0 0 - 1 + 1 0 0 + 1 - 1 0 0 0 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} m=\begin{pmatrix} 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad -1 &{}\quad +1 \\ 0 &{}\quad 0 &{}\quad +1 &{}\quad -1 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 0 \end{pmatrix} \end{aligned}$$\end{document}

and we obtain

n = n + m = 4 0 0 0 0 4 0 1 0 0 5 0 0 0 0 4 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} n'=n+m=\begin{pmatrix} 4 &{}\quad 0 &{}\quad 0 &{}\quad 0 \\ 0 &{}\quad 4 &{}\quad 0 &{}\quad 1 \\ 0 &{}\quad 0 &{}\quad 5 &{}\quad 0 \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad 4 \end{pmatrix} \, . \end{aligned}$$\end{document}

Comparing the value of κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} of n and n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'$$\end{document} we note that:

  • With a distance weight we have κ w ( n ) > κ w ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w(n')>\kappa _w(n)$$\end{document} by virtue of Proposition 3;

  • With quadratic weights we have κ q ( n ) < κ q ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _q(n')<\kappa _q(n)$$\end{document} ;

  • With linear weights we get κ l ( n ) = κ l ( n ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _l(n')=\kappa _l(n)$$\end{document} .

While with distance weights the maximum agreement is achieved by maximizing the counts in the diagonal cells, with quadratic weights a certain amount of moderate disagreement is preferred to a small amount of strong disagreement.

Moreover, from the above example, we observe there is a special behavior of the linear weights, because some Markov moves do not change the value of the weighted kappa. This affects also the problem of finding the configuration with maximum agreement, since in general such a configuration is not unique. We state below a result for linear weights, and we will discuss this issue in the next section from the point of view of computations.

Proposition 4.

Let us consider four indices i 1 < i 2 j 1 < j 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1<i_2\le j_1 <j_2$$\end{document} or j 1 < j 2 i 1 < i 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$j_1< j_2 \le i_1 < i_2$$\end{document} , and take the basic move m with m i 1 j 1 = m i 2 j 2 = + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m_{i_1j_1}=m_{i_2j_2}=+1$$\end{document} and m i 1 j 2 = m i 2 j 1 = - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$m_{i_1j_2}=m_{i_2j_1}=-1$$\end{document} . If n is a table with n i 1 j 2 > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{i_1j_2}>0$$\end{document} and n i 2 j 1 > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{i_2j_1}>0$$\end{document} . Using the linear weights we get

κ l ( n ) = κ l ( n + m ) . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \kappa _l(n) = \kappa _l(n+m) \, . \end{aligned}$$\end{document}

Proof.

As in the previous proposition, let us consider only the case i 1 < i 2 j 1 < j 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1<i_2\le j_1 <j_2$$\end{document} .

Notice that the conditions n i 1 j 2 > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{i_1j_2}>0$$\end{document} and n i 2 j 1 > 0 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n_{i_2j_1}>0$$\end{document} are needed in order to have a nonnegative table n = n + m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'=n+m$$\end{document} . Since n and n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'$$\end{document} have the same margins, it is enough to compare the observed agreement. From Eq. (17), we get:

A o , w ( n ) - A o , w ( n ) = A o , w ( m ) = - 1 N i , j = 1 k , u ij m ij = - 1 N · ( j 1 - i i ) + ( j 2 - i 2 ) - ( j 2 - i 1 ) - ( j 1 - i 2 ) k - 1 = 0 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n')-A_{o,w}(n)= & {} A_{o,w}(m) = - \frac{1}{N} \sum _{i,j=1}^k, u_{ij}m_{ij} \\= & {} -\frac{1}{N} \cdot \frac{(j_1-i_i)+(j_2-i_2)-(j_2-i_1)-(j_1-i_2)}{k-1} = 0 \, . \end{aligned}$$\end{document}

\documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\square $$\end{document}

The condition i 1 < i 2 j 1 < j 2 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_1<i_2\le j_1 <j_2$$\end{document} means that we apply a move on one side of the table w.r.t. the diagonal, and one nonzero element of the move is on the diagonal when i 2 = j 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i_2=j_1$$\end{document} . Under such a condition, the move does not affect the value of the weighted kappa.

In view of Proposition 4, the uniqueness of the table with a given value of κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} is not guaranteed under any of weighting schemes, but this issue is especially relevant for the linear weights. To illustrate this, let us consider the table (with synthetic data) in Fig. 4a. By direct enumeration of the 644, 850 tables of the fiber, one finds 1527 tables with the same margins and with the same value of the weighted kappa with linear weights as the observed table, i.e., κ l = 0.5023 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _l=0.5023$$\end{document} . Among those tables, the weighted kappa with quadratic weights ranges from κ q = 0.3774 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _q=0.3774$$\end{document} to κ q = 0.7406 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _q=0.7406$$\end{document} . The minimum is achieved in 3 tables, one of which is in Fig. 4b, while the maximum is achieved in 3 tables, one of which is in Fig. 4c.

Figure 4. A synthetic observed table (a) and two tables with the same margins and with the same weighted kappa under linear weights (b, c).

Let us now turn to the multi-rater setting. From Eqs. (10), (8), (9) it is easy to argue that the effect of a basic move on the value of the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} is yielded by the two-way margins of the move. Each two-way projection applies to a two-way margin and gives its own contribution in the sum in Eq. (8).

The following proposition collects the properties of the two-way margins of a basic move, and its proof is immediate.

Proposition 5.

Let m be a basic move in the multi-rater case with r raters. Suppose that m is equal to + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} in ( i 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_1, \ldots ,i_r)$$\end{document} and in ( i 1 , , i r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_1,\ldots , i'_r)$$\end{document} and is equal to - 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$-1$$\end{document} in ( j 1 , , j r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j_1, \ldots ,j_r)$$\end{document} and in ( j 1 , , j r ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j'_1, \ldots , j'_r)$$\end{document} . The projection of m on the pair (U, V) is:

  • a basic move for the two-way problem, if the four pairs ( i u , i v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i_u, i_v)$$\end{document} , ( i u , i v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(i'_u,i'_v)$$\end{document} , ( j u , j v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j_u,j_v)$$\end{document} , ( j u , j v ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$(j'_u,j'_v)$$\end{document} are all distinct;

  • a null move, otherwise.

Note that, following the definition of basic move in Definition 5, it is easy to see that a multi-rater basic move m always yields at least one basic move on some two-dimensional margin.

For example, both the moves for the three-rater problem displayed in Fig. 3 produce a basic move on two two-way margins and a null move on one margin.

In general, the analysis of the effect of the basic moves on the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} is more difficult than in the two-rater case. Nevertheless, we can state the following lemma, which generalizes Lemma 1.

Lemma 2

Let i j \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$i \ne j$$\end{document} be two indices, and let m be a basic move with

m i i = m j j = + 1 . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} m_{i\ldots i}=m_{j\ldots j}=+1 \, . \end{aligned}$$\end{document}

If n + m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n+m$$\end{document} is nonnegative, then

(19) A o , w ( n + m ) A o , w ( n ) . \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} A_{o,w}(n+m) \ge A_{o,w}(n) \, . \end{aligned}$$\end{document}

Proof.

It is enough to observe that all two-way margins of m are either a null move or a basic move satisfying the hypothesis of Lemma 1. \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\square $$\end{document}

Figure 5. An observed table with 3 raters and 3 levels.

Also the multi-rater case the issue of non-uniqueness is especially relevant under linear weights. For instance, let us consider the three way table in Fig. 5. Although the table is rather sparse (the sample size is 16 in a contingency table with 27 cells), there are 2, 324 tables with the same value of κ C , l = 0.4872 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C,l}=0.4872$$\end{document} . Under quadratic weights, such tables yield values of κ C , q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C,q}$$\end{document} ranging from 0.3364 to 0.6313.

4. Simulated Annealing for Maximum Agreement

In this section, we show how to use a simulated annealing algorithm to determine the maximum value of the weighted kappa with fixed marginal distributions and to find a table where the maximum is actually reached. The Markov bases introduced in Sect. 2 are used in the algorithm to define the neighbors of the contingency tables and to navigate the fiber of an observed table.

While the computation of the maximum agreement is simple for the unweighted κ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa $$\end{document} in the two-rater setting, the problem is not trivial when the weighted κ w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _w$$\end{document} is considered, or we use the Conger’s κ C \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _C$$\end{document} or κ C w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _{C_w}$$\end{document} for the multi-rater problem.

The Markov chain simulated annealing algorithm starts from the observed table and runs at each step b ( b = 1 , , B \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$b=1, \ldots , B$$\end{document} ) as follows. First, we choose a move m in the relevant Markov basis M \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$${{\mathcal {M}}}$$\end{document} and we define n = n + m \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'=n+m$$\end{document} ; if n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'$$\end{document} is a nonnegative table, then we move the chain from n to n \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$n'$$\end{document} with a transition probability depending on two factors. On the one hand, the transition probability is equal to 1 if the move causes an increase in the observed agreement, while it is less than one if the move causes a decrease in the observed agreement, and this probability is lower the more the decrease is high. On the other hand, the transition probability decreases with the time.

In practice, in the first part of the walk the Markov chain performs exploration, while in the second part it performs exploitation, because the probability of an actual move toward a table with smaller observer agreement decreases with the time. With our notation, the formula for the transition probability is:

min exp ( ( A o , w ( n ) - A o , w ( n ) ) / τ b ) , 1 , \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\begin{aligned} \min \left\{ \exp ((A_{o,w}(n')-A_{o,w}(n'))/\tau _b), 1 \right\} , \end{aligned}$$\end{document}

where A o , w \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$A_{o,w}$$\end{document} is the observed agreement in the table n, and τ b \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _b$$\end{document} is the temperature at time b.

As a special feature of this algorithm, we have added a final step to apply all possible moves with two + 1 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$+1$$\end{document} on the main diagonal, thus exploiting the results in Lemmas 1 and 2. The pseudo-code of the algorithm is in Fig. 6.

The reader can refer to Suman and Kumar (Reference Suman and Kumar2006) for a general introduction to simulated annealing in the discrete case and for a discussion on the computational details of the algorithm, as, for instance, the choice of the temperature function τ b \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau _b$$\end{document} . In particular, from our experiments, the choice of the function for the temperature decrease does not affect the performance of the algorithm, and thus, we have used a temperature of the form τ = τ 0 · d b \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\tau =\tau _0 \cdot d^b$$\end{document} .

Figure 6. Simulated annealing for maximum agreement.

Notice that the non-uniqueness of the configuration is still an issue also when finding the maximum, especially using the linear weights. As an example in the two-rater framework, consider again the observed table in Fig. 4a. With linear weights, there are 5 tables which reach the maximum value of κ l = 0.7511 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _l=0.7511$$\end{document} , and among these tables the κ q \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _q$$\end{document} ranges from 0.7665 to 0.8703, the latter being also the maximum with quadratic weights. The maximum with the sqrt weights is κ s = 0.7528 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _s=0.7528$$\end{document} . The three configurations obtained with our algorithm are displayed in Fig. 7. In accordance with the findings in the previous sections, we note that quadratic weights avoid strong disagreement cells, while sqrt weights fill the diagonal as much as possible. Again, the table with maximum linear weight is not unique, and in fact, the three tables in Fig. 7 share the same of κ l \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\kappa _l$$\end{document} .

Figure 7. Configurations with maximum weighted kappa for the observed table in Fig. 4 with quadratic weights (left), linear weights (center), sqrt weights (right).

The algorithm converges very fast, at least for small- and medium-sized tables, yielding the maximum value of the weighted kappa and a table where such a maximum is reached in less than 1 second on a standard PC. For large tables, the convergence takes long times, and the problem becomes fast unfeasible when the number of cells is large. In fact, on the one side large tables are usually sparse, on the other side, the relevant Markov basis is large, and at each step, the probability of an applicable move is very low. As a consequence, for large tables the number of Markov chain steps B must be quite large to ensure convergence. Some experiments are shown through a simulation study in the next section. In our experiments, we have found a fast convergence: for instance, on a standard PC the algorithm for the two-rater problem runs in less than 1 s for tables up to k = 10 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k=10$$\end{document} rating categories (100 cells), and in less than 10 s for tables up to k = 18 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k=18$$\end{document} rating categories (324 cells).

Note that one can replace the fixed run length B with a stopping rule, and this is the strategy implemented in our simulation study. For instance, in small problems one can stop the algorithm when the algorithm does produce actual moves for 1000 consecutive steps. For large tables, the stopping rule must take into account also the cardinality of the Markov basis. More details on this point are discussed in the next section.

In general, the use of algebraic statistics in the case of large tables is problematic, and the curse of dimensionality is a known issue of this discipline. The definition of new techniques to speed up the convergence of Markov chain-based algorithms within algebraic statistics is still a current research topic, see, for instance, Windisch (Reference Windisch2016), and only ad hoc solutions for special problems are currently available.

5. Simulation Study

In order to show the practical applicability of the algorithm introduced in the previous section, and to study its convergence properties, we have designed and performed a simulation study with several scenarios. For the two-rater case, we have considered three values of the number of levels k ( k = 3 , 5 , 7 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k=3,5,7$$\end{document} ) and two sample sizes ( N = 20 , 100 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$N=20,100$$\end{document} ). Moreover, two types of marginal distributions are considered: a first case with homogeneous uniform margins and a second case with non-homogeneous margins. In the first case, the tables are generated from a multinomial distribution with probabilities given by μ μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu \otimes \mu $$\end{document} , with μ = ( 1 / k , , 1 / k ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu =(1/k, \ldots , 1/k)$$\end{document} , while in the second case the probability parameter of the multinomial distribution is μ ν \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu \otimes \nu $$\end{document} with μ ν \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu \ne \nu $$\end{document} . In the non-homogeneous case, the parameters μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu $$\end{document} and ν \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\nu $$\end{document} are chosen to account for the tendency of a rater to choose rating levels higher or lower than those of the other rater. For instance, in the 3 × 3 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$3 \times 3$$\end{document} case, we have used μ = ( 2 / 5 , 2 / 5 , 1 / 5 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu =(2/5,2/5,1/5)$$\end{document} and ν = ( 1 / 5 , 2 / 5 , 2 / 5 ) \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\nu =(1/5,2/5,2/5)$$\end{document} .

Notice that, with this procedure, we obtain different observed marginal distributions also when the parameter of the multinomial distribution is of the form μ μ \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$\mu \otimes \mu $$\end{document} , and thus, the problem of finding the maximum weighted kappa is not trivial even in these scenarios. Also a simulation study for the three-rater case is presented, but limited to two numbers of categories k = 3 , 5 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$k=3,5$$\end{document} .

The convergence of the algorithm is measured as follows. The algorithm stops when there is a sufficiently large number c of consecutive steps with no change in the observed agreement (and therefore without changes in the weighted kappa). The number c must take into account the number of moves in the Markov basis. We have defined here c = max { 10 · # M ; 1 , 000 } \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$c=\max \{10\cdot \#{{\mathcal {M}}}; 1,000\}$$\end{document} . This choice of c is a reasonable trade-off between accuracy and speed. For each scenario, a sample of 1, 000 tables is generated and the distribution of the stopping time ST is approximated through the 1, 000 observed values. The simulation study has been performed using three weighting schemes: quadratic, linear, and sqrt.

The results are displayed in Table 1 for the two-rater scenarios and in Table 2 for the there-rater scenarios. The mean, the standard deviation, and the 99th percentile of the convergence time ST are reported. In such tables, only the results for tables with homogeneous margins are considered. Since the results for tables with non-homogeneous margins are very similar, they are reported as Tables 3 and 4 in “Appendix.”

Table 1. Two-rater case with homogeneous marginal distributions.

Time to convergence (mean, standard deviation and 99th percentile) of the simulated annealing algorithm for different numbers of levels k and sample sizes N.

Table 2. Three-rater case with homogeneous marginal distributions.

Time to convergence (mean, standard deviation and 99th percentile) of the simulated annealing algorithm for different numbers of levels k and sample sizes N.

From the results, we see that the time to convergence increases with the sample size and with the dimension of the table, and this is particularly relevant in the three-rater case. As discussed in the previous sections, when the number of raters increases, the number of basic moves in the Markov basis grows, and the probability of selecting a non-applicable move becomes high, especially in the case of sparse tables. To overcome this problem, the definition of the stopping time c requires a large number of steps when the Markov basis is large and consequently the execution time increases. For large sparse tables, the algorithm needs special attention in the choice of the numerical parameters and in the optimization of the selection of the moves. A thorough study in this direction is beyond the scopes of the present paper. That is why we do not present the case of 7 × 7 × 7 \documentclass[12pt]{minimal}\usepackage{amsmath}\usepackage{wasysym}\usepackage{amsfonts}\usepackage{amssymb}\usepackage{amsbsy}\usepackage{mathrsfs}\usepackage{upgreek}\setlength{\oddsidemargin}{-69pt}\begin{document}$$7 \times 7 \times 7$$\end{document} tables. Finally, with regard to the choice of the weights, we observe that the algorithm is a bit faster with the linear weights.

6. Concluding Remarks

The analysis of the kappa-type indices through basic Markov moves presented in this paper allows us to better understand the effect of the choice of the weights and, in particular, shows that the configuration with maximum kappa strongly depends on the weights, making the normalization of the kappa statistics a non-trivial task. We have shown that, when the weights satisfy the triangular inequality, the table with maximum kappa looks quite different from that obtained with quadratic weights, and therefore, the use of distance weights should be considered as an option when choosing the weights. Since the basic moves make connected the fiber of all tables with the same margins, we have implemented a simulated annealing algorithm to actually find the configuration with maximum kappa with fixed margins in a general framework.

Future works will include the analysis of the maximum agreement when not all raters classify the same set of objects, and the speed up of the simulated annealing algorithm, especially for large sparse tables. The convergence of Markov chain-based algorithms with Markov bases for large sparse tables is a general problem in algebraic statistics, and thus, any advance in this direction would represent a notable progress also in other fields of application. Finally, we have shown that the set of all tables with a given value of weighted kappa with linear weights can be a rather large set, and it can be explored through suitable Markov bases.

Acknowledgements

The author thanks the anonymous referees for their valuable suggestions. The author is member of the INdAM-GNAMPA group.

Appendix

In this appendix, the results of the simulation study with non-homogeneous margins are reported. See Sect. 5 for the description of the simulation study (Tables 3, 4).

Table 3. Two-rater case with non-homogeneous marginal distributions.

Time to convergence (mean, standard deviation and 99th percentile) of the simulated annealing algorithm for different numbers of levels k and sample sizes N.

Table 4. Three-rater case with non-homogeneous marginal distributions.

Time to convergence (mean, standard deviation and 99th percentile) of the simulated annealing algorithm for different numbers of levels k and sample sizes N.

Footnotes

Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

Aoki, S., Hara, H., & Takemura, A. (2012). Markov bases in algebraic statistics. Springer.CrossRefGoogle Scholar
Cicchetti, D. V., Allison, T., (1971). A new procedure for assessing reliability of scoring EEG sleep recordings American Journal of EEG Technology 11 (3) 101110 10.1080/00029238.1971.11080840CrossRefGoogle Scholar
Cohen, J., (1960). A coefficient of agreement for nominal scales Educational and Psychological Measurement 20 (1) 3746 10.1177/001316446002000104CrossRefGoogle Scholar
Cohen, J., (1968). Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit Psychological Bulletin 70 (4) 213220 19673146 10.1037/h0026256CrossRefGoogle ScholarPubMed
Conger, A. J., (1980). Integration and generalization of kappas for multiple raters Psychological Bulletin 88 (2) 322328 10.1037/0033-2909.88.2.322CrossRefGoogle Scholar
Diaconis, P., Sturmfels, B., (1998). Algebraic algorithms for sampling from conditional distributions Annals of Statistics 26 (1) 363397 10.1214/aos/1030563990CrossRefGoogle Scholar
Fleiss, J. L., Cohen, J., (1973). The equivalence of weighted kappa and the intraclass correlation coefficient as measures of reliability Educational and Psychological Measurement 33 (3) 613619 10.1177/001316447303300309CrossRefGoogle Scholar
Fleiss, J. L., Levin, B., & Cho Paik, M. (2003). Statistical methods for rates and proportions (3rd ed.). Wiley.Google Scholar
Kvålseth, T. O., (2018). An alternative interpretation of the linearly weighted kappa coefficients for ordinal data Psychometrika 83 (3) 618627 10.1007/s11336-018-9621-1CrossRefGoogle Scholar
Li, P., (2016). A note on the linearly and quadratically weighted kappa coefficients Psychometrika 81 (3) 795801 27246436 10.1007/s11336-016-9501-5CrossRefGoogle ScholarPubMed
LoMartire, R. (2020). R package rel: Reliability coefficients. https://CRAN.R-project.org/package=rel.Google Scholar
Lowry, R. (2020). Kappa as a measure of concordance in categorical sorting. http://vassarstats.net/kappa.html.Google Scholar
Rapallo, F., (2003). Algebraic Markov bases and MCMC for two-way contingency tables Scandinavian Journal of Statistics 30 (2) 385397 10.1111/1467-9469.00337CrossRefGoogle Scholar
Rapallo, F., (2005). Algebraic exact inference for rater agreement models Statistical Methods and Applications 14 (1) 4566 10.1007/BF02511574CrossRefGoogle Scholar
Schuster, C., (2004). A note on the interpretation of weighted kappa and its relations to other rater agreement statistics for metric scales Educational and Psychological Measurement 64 (2) 243253 10.1177/0013164403260197CrossRefGoogle Scholar
Shoukri, M. M. (2010). Measures of interobserver agreement and reliability. CRC Press.CrossRefGoogle Scholar
Sim, J., Wright, C. C., (2005). The kappa statistic in reliability studies: Use, interpretation, and sample size requirements Physical Therapy 85 (3) 257268 15733050 10.1093/ptj/85.3.257CrossRefGoogle ScholarPubMed
Sullivant, S., (2007). Toric fiber products Journal of Algebra 316 (2) 560577 10.1016/j.jalgebra.2006.10.004CrossRefGoogle Scholar
Sullivant, S. (2018). Algebraic statistics. Number 194 in graduate studies in mathematics. AMS.CrossRefGoogle Scholar
Suman, B., Kumar, P., (2006). A survey of simulated annealing as a tool for single and multiobjective optimization Journal of the Operational Research Society 57 (10) 11431160 10.1057/palgrave.jors.2602068CrossRefGoogle Scholar
Umesh, U., Peterson, R. A., Sauber, M. H., (1989). Interjudge agreement and the maximum value of kappa Educational and Psychological Measurement 49 (4) 835850 10.1177/001316448904900407CrossRefGoogle Scholar
Vanbelle, S., (2019). Asymptotic variability of (multilevel) multirater kappa coefficients Statistical Methods in Medical Research 28 10–11 30123026 30132375 10.1177/0962280218794733CrossRefGoogle ScholarPubMed
Vanbelle, S., Albert, A., (2009). A note on the linearly weighted kappa coefficient for ordinal scales Statistical Methodology 6 (2) 157163 10.1016/j.stamet.2008.06.001CrossRefGoogle Scholar
von Eye, A., & Mun, E.-Y. (2004). Analyzing rater agreement: Manifest variable methods. Lawrence Erlbaum Associates.Google Scholar
Warrens, M. J., (2013). Cohen’s weighted kappa with additive weights Advances in Data Analysis and Classification 7 (1) 4155 10.1007/s11634-013-0123-9CrossRefGoogle Scholar
Windisch, T., (2016). Rapid mixing and Markov bases SIAM Journal on Discrete Mathematics 30 (4) 21302145 10.1137/15M1022045CrossRefGoogle Scholar
Figure 0

Figure 1. Two psychiatrists’ rating of severity of depression. The observed table (left), a table with the same margins and maximum agreement with linear weights (center), and the table with the same margins and maximum agreement with quadratic weights (right).

Figure 1

Figure 2. Four basic moves for the two-rater problem. a Two nonzero elements on the diagonal; b one nonzero element on the diagonal, the move lies on the upper triangle; c one nonzero element on the diagonal, the move lies on both the upper and the lower triangle; d no nonzero elements on the diagonal.

Figure 2

Figure 3. Two basic moves for the three-rater problem. A move of type (a) and a move of type (b) from Proposition 2.

Figure 3

Figure 4. A synthetic observed table (a) and two tables with the same margins and with the same weighted kappa under linear weights (b, c).

Figure 4

Figure 5. An observed table with 3 raters and 3 levels.

Figure 5

Figure 6. Simulated annealing for maximum agreement.

Figure 6

Figure 7. Configurations with maximum weighted kappa for the observed table in Fig. 4 with quadratic weights (left), linear weights (center), sqrt weights (right).

Figure 7

Table 1. Two-rater case with homogeneous marginal distributions.

Figure 8

Table 2. Three-rater case with homogeneous marginal distributions.

Figure 9

Table 3. Two-rater case with non-homogeneous marginal distributions.

Figure 10

Table 4. Three-rater case with non-homogeneous marginal distributions.