Estimation of Linear Models from Coarsened Observations: A Method of Moments Approach

Bernard M. S. van Praag; J. Peter Hop; William H. Greene

doi:10.1017/psy.2024.19

Estimation of Linear Models from Coarsened Observations: A Method of Moments Approach

Published online by Cambridge University Press: 10 March 2025

Bernard M. S. van Praag

J. Peter Hop and

William H. Greene

Show author details

Bernard M. S. van Praag*: Affiliation:
Tinbergen Institute, University of Amsterdam, The Netherlands
J. Peter Hop: Affiliation:
Independent, The Netherlands
William H. Greene: Affiliation:
University of South Florida, USA
*: Corresponding author: Bernard M. S. van Praag; Email: [email protected]

Article contents

Abstract
Introduction
Regression in the population space
Large sample results for regression
Polychoric correlations and coefficients of determination
Two empirical examples and one simulation experiment
Concluding remarks
Footnotes
References

Rights & Permissions

Abstract

In the last few decades, the study of ordinal data in which the variable of interest is not exactly observed but only known to be in a specific ordinal category has become important. To emphasize that the problem is not specific to a specific discipline we will use the neutral term coarsened observation. For single-equation models estimation of the latent linear model by Maximum Likelihood (ML) is routine. But, for higher-dimensional multivariate models it is computationally cumbersome as estimation requires the evaluation of multivariate normal distribution functions on a large scale. Our proposed alternative estimation method, based on the Generalized Method of Moments (GMM), circumvents this multivariate integration problem. It can be implemented by repeated application of standard techniques and provides a simpler and faster approach than the usual ML approach. It is applicable to multiple-equation models with $K$-dimensional error correlation matrices and ${J}_k$ response categories for the kth equation. It also yields a simple method to estimate polyserial and polychoric correlations. Comparison of our method with the outcomes of the Stata ML procedure cmp yields estimates that are not statistically different, while estimation by our method requires only a fraction of the computing time.

Keywords

coarsened events generalized method of moments item response models multivariate ordered probit ordered qualitative data ordinal data analysis polychoric correlations

Type: Theory and Methods
Information: Psychometrika , First View , pp. 1 - 34

DOI: https://doi.org/10.1017/psy.2024.19 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of Psychometric Society

1 Introduction

The statistical tools of empirical Psychometrics, Econometrics, Political Science, and many other empirical sciences including marketing analysis, agriculture, health, and medical statistics, find their origin in the linear regression model.Footnote ¹ The idea is that a random phenomenon Y can be predicted by variables X in the sense that $Y\approx f\left(X;\beta \right)$ , where $\beta$ is a parameter vector. Later, the same model is applied to explain the phenomenon Y, as caused by the variables X. In the econometrics literature since the 1960s this resulted in a host of different models, described in textbooks such as Greene (Reference Greene2018) and Cameron and Trivedi (Reference Cameron and Trivedi2005). In psychometrics there was a similar development known as the Structural Equation Model (SEM) (Duncan, Reference Duncan1975; Hayduk, Reference Hayduk1987; Bollen, Reference Bollen1989; Jöreskog, Reference Jöreskog, Goldberger and Duncan1973). (For software, see, for example, Narayanan (Reference Narayanan2012).) The main idea behind the modeling approach is that the phenomenon $Y$ to be studied has a conditional mean function that depends on other variables ${X}_1,\dots, {X}_M$ , say, $E\Big(Y\left|X=x\Big)=f\left(x;\beta \right)\right.$ , where $f(.)$ is a continuous and differentiable function and $\beta$ is a set of parameters. In practice the model is usually taken to be linear, that is, $E\left[Y|X=x\right]={\beta}_0+{\beta}_1{x}_1+\dots +{\beta}_M{x}_M$ , which produces the linear regression model. In the traditional approach, the variable $Y$ is continuous and directly observable. In economics the model approach (the first influential introduction is by Hood and Koopmans (Reference Hood and Koopmans1953)) is used to describe dependencies between economic variables, e.g., purchase intentions as a function of income, prices, education, family size, and age. In marketing the modeling approach is used to develop and assess the effects of advertising, prices, promotions, etc. (see e.g., Fok (Reference Fok, Leeflang, Wieringa, Bijmolt and Pauwels2017)). In medical statistics, a model is used to evaluate the response to diagnostic tests. These are a few examples to illustrate the pervasiveness of the regression model approach in empirical sciences.

The one-dimensional observation ${Y}_i$ may be replaced by a $K$ -dimensional vector, ${Y}_i=\left({Y}_{i1},\dots, {Y}_{iK}\right)$ and the function $f(.)$ by a $K$ -vector function, $f=\left({f}_1,\dots, {f}_K\right)$ .

In practice, the parameters of interest in such a model are estimated from a set of $N$ observations ${\left(\left\{{Y}_i,{X}_i\right\}\right)}_{i=1}^N$ . Randomness enters the observed outcomes through the difference between the individual responses, ${Y}_i$ , and the conditional expectation for individual observation $i$ , $E\Big({Y}_i\left|{X}_i={x}_i\Big)=f\left({x}_i\right)\right.$ . Hence, we introduce an “error,” ${\varepsilon}_i$ , or a $K$ -vector of errors, ${\varepsilon}_i=\left({\varepsilon}_{i1},\dots, {\varepsilon}_{iK}\right)$ , which is defined for each response as ${\varepsilon}_i\overset{def}{=}{Y}_i-f\left({X}_i;\beta \right)$ . The error represents the aggregate of other possible, unobserved variables as well as the randomness of individual behavior. The random term is generated by a mean zero process that operates stochastically independently of $X$ . If the model is linear, we denote the discrepancy as the residual $\varepsilon \overset{def}{=}Y-{X}^{\prime}\beta$ . This approach is applicable when the dependent variable(s) $Y$ and the explanatory variable(s) $X$ is (are) cardinal, i.e., are expressed in observable numerical values.

However, in many practical cases the dependent variable $Y$ is coarsened; it is only observable in terms of ordinal categories on a preference scale, such as subjective health status, well-being, reported as “bad” or “good,” or “like” or “dislike,” or “poor, fair, good, very good, excellent,” or some ordinal ranking from one to five where a cardinal interpretation becomes dubious. Although the observations take place in a coarsened mode, we must interpret the discrete answers as reflecting a latent variable $Y$ , the range of which is a continuum. In that situation, we say the observations are “coarsened” or condensed into a set of $J$ adjacent intervals ${\left\{\left({\nu}_{j-1},{\nu}_j\right]\right\}}_{j=1}^J={\left\{{S}_j\right\}}_{j=1}^J$ on the real axis. For an individual i, the latent observation is ${Y}_i\in {S}_{j_i}$ , if the realized observation is ${j}_i$ . Hence, we see that the estimation of the model is complicated by two factors. First, there is the statistical problem that there is always a random error term involved. Second, there is the additional observation problem that the continuum of observations is coarsened or condensed (Maris, Reference Maris1995) and mapped on a discrete event space $\left\{j=1,\dots, J\right\}$ . We will call such data Ordinally Coarsened (OC).

Since about 1934 in bioassay studies (Bliss, Reference Bliss1934; Finney, Reference Finney1947, Reference Finney1971), and in Economics, Sociology, and Political Science, much later in the 1960s and 1970s researchers realized that many variables of interest have an ordinal coarsened character. For instance, a question on self-assessed health status may be responded to with ordered labels varying from “very healthy,” to “very unhealthy.” In modern datasets, especially survey data, such verbal evaluations are abundant. The World Happiness Report, Oxford University (2024) is a notable example. Such coarsening is nearly always dictated by the fact that respondents are unable to quantify their answers directly on a numerical continuous scale but only in terms of a few ordered verbally described categories. These values are not expressed in numbers but in ordinal qualitative terms. In Psychometrics popular item response theory offers many examples (see Wiberg et al. (Reference Wiberg, Culpepper, Janssen, González and Molenaar2019)). To stress that the statistical problems are not specific to one discipline we will also speak of coarsened data.

In psychometric Structural Equations models (SEM) things may be further complicated by the fact that the explanatory $X$ variables are sometimes coarsened as well (cf. Jöreskog (Reference Jöreskog, Goldberger and Duncan1973)). In this paper, we will assume that the explanatory $X$ variables are either directly observable on a continuous scale, or coarsened in ordered classes labeled 1, 2,.., or by dichotomous dummy variables, taking the value zero or one.

A specific case of coarsened events is that of psychological testing (item-response theory IRT) and that of multiple-choice tests used in exams. Say, the exam or test consists of K items with each J ordered response categories. Then the test may be described by K item scores ${Y}_1,\dots, {Y}_K$ . The response to each separate item may be dichotomous (e.g. correct/false) or polychotomous (correct, not wholly correct,…, false).

The probability of the response on item k is ${p}_{ikj}=F\left({b}_k;{\theta}_i\right)$ , where ${b}_k$ stands for the difficulty of the item k and ${\theta}_i$ for the ability of respondent i. From the joint probability of the K responses by individual i we try to estimate the latent ability ${\theta}_i$ (e.g., IQ) of the individual by maximizing per individual i the joint probability ${\prod}_{k=1}^KF\left({b}_k;{\theta}_i\right)$ , with respect to ${\theta}_i$ . The ability ${\theta}_i$ may also depend on (or be explained by) individual observable characteristics ${X}_i=\left({X}_{i1},\dots, {X}_{iM}\right)$ , say ${\theta}_i={X}_i^{\prime}\beta +{D}_i{\beta}_{0i}$ ,where the dummy variable ${D}_i$ equals one for individual i and zero for others and ${\beta}_{0i}$ is the ability of individual i. The term ${X}_i^{\prime}\beta$ gives then the part of ability or intelligence that can be explained by e.g., education, genetic factors, health, income, etc., while the individual parameter ${\beta}_{0i}$ may be identified as the unexplainable truly individual ability component. We notice that the estimation of ${\beta}_{0i}$ requires that K > 1 and preferably considerably larger than one.

The method developed below may be used to estimate these $\beta$ ’s and ${\theta}_i$ ’s. The response probability $F\left({b}_k;{\theta}_i\right)$ of a separate item score is frequently assumed to be described by a normal or logistic distribution function. Estimation of ${\theta}_i$ is then rather easy. However, intuition tells us that mostly, the item scores on different items by a respondent will be correlated. This is almost unavoidable if the response behavior depends on one ${\theta}_i$ . In that case, the ML- estimation is mostly difficult since estimation will require the evaluation of many multivariate integrals. The correlation means that the multivariate integrals cannot be reduced to products of one-dimensional integrals. The method proposed by us will avoid this problem.

For directly observed cardinal data, ordinary least squares (OLS) is usually the default estimator of choice.Footnote ² There are many extensions of the OLS estimator that are used in nonstandard cases, such as nonzero covariances across observations. A familiar alternative to OLS is Generalized Least Squares (GLS), in which the disturbances of the $K$ observations per observation unit $i$ are heteroskedastic or correlated. Then an unknown error-covariance matrix has to be estimated as well. If this is feasible, we call the method Feasible Generalized Least Squares (FGLS). Another well-known example is Seemingly Unrelated Regressions (SUR), where $K$ response variables are explained by $K$ equations where the errors are correlated. We refer to well-known econometric textbooks such as Amemiya (Reference Amemiya1985), Cameron and Trivedi (Reference Cameron and Trivedi2005), Greene (Reference Greene2018) and Verbeek (Reference Verbeek2017) for elaborate descriptions. We note that in modern work these estimations are usually based on the assumption of a known error distribution, usually multivariate normal, leading to Maximum Likelihood (ML) estimation. In the early received literature, least squares were not explicitly based on an underlying error distribution, but rather on the minimization of a Sum of Squared Residuals (SSR) that led to unbiased estimation of the regression coefficients. Later it was found that SSR minimization and ML-estimation led to the same estimator when the errors are normally distributed. (The normality assumption was also used to motivate certain inference procedures.) The common counterpart for the linear regression model in case of coarsening of $Y$ is the Ordered Probit or Ordered Logit model (OP or OL). In the literature, it is frequently called Probit or Logit Regression. (See e.g., McKelvey and Zavoina (Reference McKelvey and Zavoina1975)).

In this paper we will develop a novel approach to coarsened data, called Feasible Multivariate Ordered Probit (FMOP), where the errors are suspected of being correlated, as is the case in, e.g., item response models, in panel data, regional data or customer satisfaction data. It follows the analogy principle as formulated by Goldberger (Reference Goldberger1968, Reference Goldberger1991) and Manski (Reference Manski1988), based on the method of moments (MoM).

The established way to treat such models with coarsened correlated observations in psychometrics or econometrics, and other empirical applications is by maximum likelihood (ML) estimation (see McFadden (Reference McFadden1989) and Hajivassiliou and McFadden (Reference Hajivassiliou and McFadden1998)), where the likelihoods per observation include $K$ -variate normal integrals (instead of densities). Those integrals are generally estimated by simulation such as by the Geweke-Hajivasilliou-Kean (GHK)-algorithm. Theoretically, this is a straightforward application of ML theory. However, the practical problem is that the evaluation of those integrals by simulation may be quite cumbersome and time-consuming.

Geweke (Reference Geweke1989), McFadden (Reference McFadden1989), Keane (Reference Keane1994), Hajivassiliou and McFadden (Reference Hajivassiliou and McFadden1998), Cappellari and Jenkins (Reference Cappellari and Jenkins2003) and Mullahy (Reference Mullahy2016), and others developed a multivariate probit estimator. Progress has also been made based on the simulated moments approach (McFadden, Reference McFadden1989), using the Gibbs sampler method (see Geman and Geman (Reference Geman and Geman1984) and Casella and George (Reference Casella and George1992)). An interesting historical survey on the (logit and) probit method is found in Cramer (Reference Cramer2004). See also Hensher et al. (Reference Hensher, Rose and Greene2015). Roodman (Reference Roodman2007, 2020) developed a flexible working Stata estimation procedure (cmp) based on the GHK simulator.

Independently, scholars in Psychometrics expanded the vast literature on IRT models yielding different tools of analysis, inspired by the differences between the subject matters between disciplines (see Bollen (Reference Bollen1989)). It is surprising that both psychometricians and econometricians are working on essentially the same methodological problems, but mostly without taking much notice of each other’s literature. A rare exception is the econometrician Goldberger (Reference Goldberger1971) who explicitly recognized the commonalities between Econometrics and Psychometrics.

Our approach, based on sample moments, does not need the evaluation of likelihoods, i.e., multi-dimensional probability integrals or simulated moments. In this paper, we assume multivariate normal error distributions for $\varepsilon$ in line with the established practice. Data on $X$ are assumed to be generated by a random process that is statistically (and functionally) independent of that which generates $\varepsilon$ . The important element is that the observed values of ${X}_i$ convey no information about the errors ${\varepsilon}_i$ , a property identified by econometricians as “strict exogeneity.” It is the property that $E\left[\varepsilon \left|X\right.\right]=0$ for every $X$ . It follows that $\mathrm{Cov}\left(X,\varepsilon \right)=\mathrm{Cov}\left(X,E\left[\varepsilon \left|X\right.\right]\right)=0$ as well. Econometricians call this condition strict exogeneity. It implies the same zero-covariance property. Data on $X$ are assumed to be “well-behaved,” meaning that in any random sample, the sample covariance matrix $\left(1/N\right)\sum \nolimits_{i=1}^N{X}_i^{\prime }{X}_i\overset{def}{=}{\widehat{\Sigma}}_X$ is always finite and positive definite. (Regularity conditions on X such as that the influence of any individual ${X}_i$ in $\left(1/N\right)\sum \nolimits_{i=1}^N{X}_i^{\prime }{X}_i$ vanishes as $N$ increases are also assumed.) Nothing further is assumed at this point about the distribution of ${X}_i$ , e.g., normality, discreteness, etc.

The theoretical model, that is, the data-generating mechanism, mimics the classical linear statistical models. The substantive difference between exact and coarsened data is in the mode of observation of the dependent variable $Y$ . In the classical framework the dependent variable $Y$ is directly observed, while in the Ordered Coarsened (OC) -observation mode, the dependent variable $Y$ is only observed to be in one of the J intervals ${S}_j=\left({\nu}_{j-1},{\nu}_j\right]$ , where the cut points ${\nu}_{j-1}$ and ${\nu}_j$ are unknown parameters to be estimated. If $Y$ is a K-vector, ${\nu}_{j-1}$ and ${\nu}_j$ are K-vectors and ${S}_j=\left({\nu}_{j-1},{\nu}_j\right]$ is a block in ${R}^K$ . Since the cut points ${\nu}_{j-1}$ and ${\nu}_j$ are unknown, it follows that the unit of measurement of $Y$ is unidentified. The usual identification is secured by setting the error variances equal to one; ${\sigma}_k^2=1$ $\left(k=1,\dots, K\right)$ .Footnote ³

The structure of this paper is as follows. In Section 2 we outline the basic probabilistic model in the presence of coarsening of the dependent variables. In Section 3 we develop the estimation method for a $K$ -equations model based on the Ordered Coarsened data model. In IRT-models, this is equivalent to $K$ item responses per respondent. We call the method Feasible Multivariate Ordered Probit (FMOP). We call it Seemingly Unrelated Ordered Probit (SUOP) when we have a $K$ -equation model with one observation ${Y}_i=\left({Y}_{i1},\dots, {Y}_{iK}\right)$ per observation unit i. Of course, mixtures of FMOP and SUOP are possible as well. Instead of differentiating high-dimensional log-likelihoods, with the likelihoods being multi-dimensional integrals, with respect to the unknown parameters $\theta =\left(\beta, \nu, \Sigma \right)$ , we derive sample moment conditions $\widehat{\overline{g}}\left(\theta \right)=0$ from the coarsened data that are analogues of the likelihood equations, $\widehat{g}\left(\theta \right)=0$ for direct observations. We then estimate $\theta$ from the equation set $\widehat{\overline{g}}\left(\theta \right)=0$ . We find that $\widehat{\overline{g}}\left(\theta \right)$ and $\widehat{g}\left(\theta \right)$ converge to the same probability limit for all values of $\theta$ . The estimation of $\theta$ from $\widehat{\overline{g}}\left(\theta \right)=0$ takes place by repeatedly applying the generalized method of moments (GMM), (Hansen, Reference Hansen1982). We note in passing that our approach employs elements of the EM algorithm (Dempster et al., Reference Dempster, Laird and Rubin1977). In Section 4 we demonstrate how to estimate polychoric correlations from SUOP-estimates. Further, we generalize the concept of the coefficient of determination, ${R}^2$ , to $K>1$ . In Section 5.1 we apply FMOP to an employment equation over a five-year panel dataset from the German SOEP dataset. In Section 5.2 we apply SUOP to a block of eight satisfaction questions extracted from the German SOEP data. In order to get more insight into the stability of the method, in Section 5.3. we do some experiments on a simulated data set. We use a recent update of the Stata procedure cmp as our benchmark to compare our alternative approach with the ML approach.

We find that the estimation results of regression coefficients and their standard errors do not differ substantially between the two methods. Where the established method may need hours, our method takes only minutes. In Section 6 we provide some concluding remarks. In the Appendix, we propose an easy intuitively appealing method to estimate the latent full error-covariance matrix, after the regression coefficients, $\beta$ , have been estimated using the coarsened data.

2 Regression in the population space

Regression analysis often begins with assumptions about the distribution of the observed data. Each estimation procedure on a sample may be seen as a reflection of a similar procedure in the population. For convenience but without loss of generality we assume $E(X)=0,E(Y)=0$ . The model of interest for equation k of $K$ equations is

(2.1)

$$\begin{align}{Y}_k\mid x=\sum \limits_{m=1}^M{x}_{m,k}{\beta}_{m,k}+{\varepsilon}_k={x}_k^{\prime }{\beta}_k+{\varepsilon}_k,k=1,\dots, K\end{align}$$

Where the model design might call for differences across equations, they can be accommodated by suitable zero restrictions on the coefficients in ${\beta}_k$ . For convenience, we ignore constant terms by setting $E(X)=0$ . The expectation of outcome or response ${Y}_k$ is conditioned or co-determined by explanatory variables/stimuli ${X}_m,\left(m=1,\dots, M\right)$ assuming values ${x}_m$ . There are $M$ variables ${X}_m$ , that are generated by a strictly exogenous process. The error vector $\varepsilon$ derives from observation-specific variation around the theoretical conditional mean. We have ${\varepsilon}_k={Y}_k-{X}^{\prime }{\beta}_k$ . Denoting the observations by ${y}_{i,k}$ the observed residual is defined as ${e}_{i,k}={y}_{i,k}-{x}_i^{\prime }{\beta}_k$ . Deviations of observations $i=1,\dots, N$ of ${Y}_{i,k}$ from the conditional mean result from the presence of unobserved elements that enter the data-generating process, for example, variation across individuals in the self-assessment of health or well-being. Random elements are assumed to be generated by a zero mean, finite variance process; if ${\varepsilon}_k={Y}_k-{X}^{\prime }{\beta}_k$ , it follows that $E\left[\varepsilon \right]=0$ and $\mathrm{Var}\left[\varepsilon \right]$ is finite. We assume that the process that generates $X$ is stochastically independent of that of $\varepsilon$ . This implies $E\left[ X\varepsilon \right]=0$ . Substitution yields the familiar regression equations

(2.2)

$$\begin{align}E\left[{X_i}^{\prime}\left({Y}_{i,k}-{X}_i^{\prime }{\beta}_k\right)\right]=0,k=1,\dots, K,\end{align}$$

where ${X_i}^{\prime }$ is a $\left(M\times K\right)$ -matrix and ${X_i}^{\prime}\left({Y}_{i,k}-{X}_i^{\prime }{\beta}_k\right)$ a K-vector.

If this holds for all $i$ , then we have

(2.3)

$$\begin{align}\sum \limits_{i=1}^NE\left[{X_i}^{\prime}\left({Y}_{i,k}-{X}_i^{\prime }{\beta}_k\right)\right] = 0 \forall k,m\end{align}$$

from which the regression coefficients $\beta$ can be solved.

We define the functions ${g}_{m,k}\left(\beta \right)=\sum \nolimits_{i=1}^NE\left[\left({X}_{i,k,m}^{\prime}\right)\left({Y}_{i,k}-{X}_i^{\prime }{\beta}_k\right)\right].$ The equation system (2.3) in $\beta$ is then shortly written as $g\left(\beta \right)=0$ .

Result (2.3) identifies the slopes $\beta$ of the conditional mean function. The zero conditional mean result in (2.2) motivates least squares without reference to minimizing the mean squared error prediction of $Y\left|X\right.$ or minimum variance linear unbiased estimation of $\beta$ .

If the error covariance matrix $\Sigma =E\left(\varepsilon {\varepsilon}^{\prime}\right)$ is not the identity matrix, we may want to correct for the unequal variances and correlations, and we weigh the observations by ${\Sigma}^{-1}={\left[E\left(Y-{X}^{\prime}\beta \right){\left(Y-{X}^{\prime}\beta \right)}^{\prime}\right]}^{-1}$ , producing

$$\begin{align*}{g}_{m,k}\left(\beta \right)=\sum \limits_{i=1}^NE\left[\left({X}_{i,m}^{\prime}\right){\Sigma}^{-1}\left({Y}_{i,k}-{X}_i^{\prime }{\beta}_k\right)\right].\end{align*}$$

By weighting the $K$ observations per individual i by ${\Sigma}^{-1}$ variances are standardized, and the error correlations are accounted for. In that case, according to the Aitken theorem the covariance matrix of the estimator $\widehat{\beta}$ is minimized.

We have motivated least squares through the moment equations (2.2). We see that we can interpret these conditions (2.2) as first-order conditions for minimizing the expectation of the squared residuals $S=E\left({\varepsilon}^{\prime }{\Sigma}^{-1}\varepsilon \right)=E{\left(Y-{X}^{\prime}\beta \right)}^{\prime }{\Sigma}^{-1}\left(Y-{X}^{\prime}\beta \right)$ .

We do not need to specify the probability distribution of $X$ and Y. We do assume well- behaved data generating processes, which will include a finite, positive variance of $\varepsilon$ and a finite positive covariance matrix, $\mathrm{Var}\left[X,Y\right]$ . If the marginal probability distribution of $\varepsilon$ is multivariate normal, the regression estimator is Maximum Likelihood. We call ${X}^{\prime}\beta$ the structural part of the model and $\varepsilon$ the disturbance, where $\beta$ is the $\left(M\times K\right)$ – matrix with columns ${\beta_1}^{\prime },\dots, {\beta_K}^{\prime }$ .

If the columns of the matrix $\beta$ are identical, this is the typical setting for longitudinal and panel data. If the coefficients vary by response setting $k$ , ${\beta}_k$ we have the situation of $K$ different model equations.

2.1 Regression for Coarsened observations

We call an observation $Y\in R$ coarsened if it is not observed directly, but only as belonging to one of the $J$ intervals ${\begin{array}{l}{\left\{\left({\nu}_{j-1},{\nu}_j\right]\right\}}_{j=1}^J={\left\{{S}_j\right\}}_{j=1}^J \end{array}}$ (The leftmost and rightmost terminals are infinity). These intervals constitute the class ℂ of observable events. We will call ℂ the observation grid. This is generalized to $K$ -dimensional observations by replacing the J observed intervals $\left({\nu}_{j-1},{\nu}_j\right]$ by $K$ -dimensional blocks.

$$\begin{align*}\left({\nu}_{j-1},{\nu}_j\right]=\left[\left({\nu}_{1,{j}_1-1},{\nu}_{1,{j}_1}\right],\dots, \Big({\nu}_{K,{j}_K-1},{\nu}_{K,{j}_K}\Big]\right]={S}_j\end{align*}$$

where the one-dimensional random observations ${j}_i$ are replaced by the index vectors ${j}_i=\left({j}_{i1},\dots, {j}_{iK}\right)$ . ℂ stands for a partition in ${R}^K$ consisting of ${J}^K$ adjacent blocks. (We take ${J}_1=\dots ={J}_K\overset{def}{=}J$ for convenience, but equality is not necessary.) We denote the ℂ-coarsened event space by ${\Omega}_{\mathrm{\mathbb{C}}}$ . We may then define the corresponding coarsened probability measure ${P}_{\mathrm{\mathbb{C}}}$ on ℂ by ${P}_{\mathrm{\mathbb{C}}}$ . We will call ℂ the $K$ -dimensional observation grid.

Coarsening of $Y$ implies that we do not directly observe $Y=y$ , but the event $Y\in {S}_j$ , and more explicitly for a $K$ -vector $Y$ that ${\nu}_{1,{j}_1-1}<{Y}_1\le {\nu}_{1,{j}_1},\dots, {\nu}_{K,{j}_K-1}<{Y}_K\le {\nu}_{K,{j}_K}$ . It follows that for given $X=x$ and $j=\left({j}_1,\dots, {j}_K\right)$ there holds for the error vector

$$\begin{align*}{\nu}_{1,{j}_1-1}-{x}_1^{\prime }{\beta}_1<{\varepsilon}_1\le {\nu}_{1,{j}_1}-{x}^{\prime }{\beta}_1,\dots, {\nu}_{K,{j}_K-1}-{x}_K^{\prime }{\beta}_K<{\varepsilon}_K\le {\nu}_{K,{j}_K}-{x}_K^{\prime }{\beta}_K,\end{align*}$$

which we denote shortly as $\varepsilon \in \left({S}_j- x\beta \right)$ . We denote the marginal probability as ${p}_{k,j}=P({\nu}_{1,j-1}-x{\beta}_k<{\varepsilon}_k\le {\nu}_{1,j}-x{\beta}_k)$ . We define the generalized residual as $\overline{\varepsilon}\overset{def}{=}E\left(\varepsilon \left|\varepsilon \right.\in \left({S}_j- x\beta \right)\right)$ . It is a random $K$ -vector defined on the blocks $\left({\nu}_{j-1}-{x}_i\beta, {\nu}_j-{x}_i\beta \right]$ . These blocks constitute individual i’s individualized observation grid ℂ ${}_i$ . The grid ℂ ${}_i$ for observation unit $i$ depends on ${x}_i^{\prime}\beta$ . However, for any given value $x$ of $X$ and any grid ℂ $\left( x\beta \right)$ we have $\sum \limits_{j=1}^J{p}_{k,j}{\overline{\varepsilon}}_{k,j}\overset{def}{=}{E}_{\mathrm{\mathbb{C}}}\left({\overline{\varepsilon}}_k\right)=E\left(\varepsilon \right)=0$ according to the Law of Iterated Expectations(LIE). Then it follows that

(2.4)

$$\begin{align}{x}_{i,k}.{E}_{\mathrm{\mathbb{C}}}\left({\overline{\varepsilon}}_{i,k}\right)=0,\forall i,k\end{align}$$

and

(2.5)

$$\begin{align}\left(1/N\right)\sum \limits_{i=1}^N{E}_{X_{i,k}}\left({X}_{i,k}.{\overline{\varepsilon}}_{i,k}\right)=0,k=1,\dots, K\end{align}$$

This is the coarsened analogue of (2.2). We define the function ${\overline{g}}_k\left(\beta \right)=\left(1/N\right)\sum \limits_{i=1}^NE\left({X}_i\left({\overline{\varepsilon}}_{i,k}\right)\right)$ . The equation system (2.4) is then shortly written as $\overline{g}\left(\beta \right) = 0$ . If the error covariance matrix is not the identity matrix, we may want to correct for the unequal variances and correlations, and we write $\overline{g}\left(\beta \right)=\sum \limits_{i=1}^N{X}_i{\overline{\Sigma}}^{-1}E\left({\overline{\varepsilon}}_{i,k}\right)$ , where $\overline{\Sigma}=\mathrm{Var}\left(\overline{\varepsilon}\right)$ .

Finally, we have for the two functions

(2.6)

$$\begin{align}g\left(\beta \right)\equiv \overline{g}\left(\beta \right)\end{align}$$

This holds for all values of $\beta$ , not only for the zero roots of (2.2) and (2.4). (2.5) holds since for any $x$ - value $xE\left(\varepsilon \right)= xE\left(\overline{\varepsilon}\right)$ .

3 Large sample results for regression

The Law of Large Numbers states that under standard regularity conditions, sample moments converge in probability to their population counterparts as the number $N$ of observations grows large. Slutsky’s theorem says that continuous and differentiable functions of random sample moments converge in probability to those functions of the population counterparts as $N$ grows large, implying that the population functions $g\left(\beta, \Sigma \right)$ are consistently estimated by filling in the corresponding sample moments.

When we want to get its (large-)sample estimator $\widehat{g}\left(\beta, \Sigma \right)$ we replace the expectations in (2.2) with the corresponding sample moment conditions and we get

(3.1)

$$\begin{align}\frac{1}{N}\sum \limits_{i=1}^N{X}_i^{\prime }{\widehat{\Sigma}}^{-1}{Y}_i-\left(\frac{1}{N}\sum \limits_{i=1}^N{X}_i^{\prime }{\widehat{\Sigma}}^{-1}{X}_i^{\prime}\right)\widehat{\beta}\overset{def}{=}\widehat{g}\left(\widehat{\beta},\widehat{\Sigma}\right)=0,\end{align}$$

where $\widehat{\Sigma}=\frac{1}{N}\sum \limits_{i=1}^N\left({Y}_i-{\beta}^{\prime }{X}_i\right){\left({Y}_i-{\beta}^{\prime }{X}_i\right)}^{\prime }$ . Solution of (3.1) with respect to the regression coefficients $\beta$ yields

(3.2)

$$\begin{align}\widehat{\beta}={\left[\frac{1}{N}\sum \limits_{i=1}^N\left({X}_i^{\prime }{\widehat{\Sigma}}^{-1}{X}_i\right)\right]}^{-1}\frac{1}{N}\sum \limits_{i=1}^N\left({X}_i^{\prime }{\widehat{\Sigma}}^{-1}{Y}_i\right).\end{align}$$

This is the well-known OLS- estimator.

Estimation of the asymptotic covariance matrix $\mathrm{Asy}.\mathrm{Var}\left[\widehat{\beta}\right]$ is usually understood to mean under the condition that $X$ equals the sample data $x$ . Then the well-known template is

(3.3)

$$\begin{align}\mathrm{Est}.\mathrm{Asy}.\mathrm{Var}\left[\widehat{\beta}\right]=\frac{1}{N}{\left[\frac{1}{N}\sum \limits_{i=1}^N\left({X}_i^{\prime }{\widehat{\Sigma}}^{-1}{X}_i\right)\right]}^{-1}\end{align}$$

3.1 Estimation from ordered coarsened data

Let us now consider the coarsened analogue. The events are elements of the observation grid ℂ. The corresponding coarsened probability measure is P _ℂ. If we would follow the conditional ML strategy, the information to be maximized is E _ℂ (ln P _ℂ) = $\sum \nolimits_{i=1}^N\sum \nolimits_{j=1}^J$ P _ℂ $\left({Y}_i\in {S}_j|{x}_i\right)\ln$ P _ℂ $\left({Y}_i\in {S}_j|{x}_i\right)$ .

The problem is here the evaluation of the P _ℂ $\left({Y}_i\in {S}_j|{x}_i\right)$ , being multivariate integrals over rectangular blocks in ${R}^K$ . We can evaluate P _ℂ $\left({Y}_i\in {S}_j|{x}_i\right)$ by its sample analogue, but this entails the evaluation of a multitude of $K$ -dimensional integrals, making this procedure very cumbersome, albeit not impossible (see Roodman (Reference Roodman2011)).

A much easier way is by making a detour and evaluating the coarsened analogue of the condition (3.1). Let ${j}_i$ be the $K$ -dimensional response by individual i. We notice that (the $K$ -dimensional) ${Y}_i\in {S}_{j_i}\mid {x}_i$ implies ${\varepsilon}_i\in \left({\nu}_{i,{j}_i-1}-{x_i}^{\prime }B,{\nu}_{i,{j}_i}-{x_i}^{\prime }B\right]$ . In this way to each observation unit $i$ is assigned its own observation grid ℂ _i depending on ${x_i}^{\prime }B$ . We define the K-vector of the generalized residuals ${\overline{\varepsilon}}_i=E\big[{\varepsilon}_i\big|{\varepsilon}_i\in \big({\nu}_{i,{j}_i-1}-{x_i}^{\prime }B,{\nu}_{i,{j}_{\begin{array}{l}i\\ {}\end{array}}}-{x_i}^{\prime }B\big],{X}_i={x}_i\big.\big]$ .

The grid ℂ _i over which the expectation is taken at the LHS in (3.1) is now a grid in the space ${R}^{M+K}$ where the first $X$ -coordinates are directly observed while the K ${\varepsilon}_i$ -coordinates are coarsened by ℂ _i . Summing over the observations we get.

(3.4)

$$\begin{align}\frac{1}{N}\sum \limits_{i=1}^NE\left[{x}_i^{\prime }{\Sigma}^{-1}{\varepsilon}_i|x\right]=\frac{1}{N}\sum \limits_{i=1}^NE_{C_{i}} \left[{x}_i^{\prime }{\overline{\Sigma}}^{\hspace{1.3pt} -1}{\overline{\varepsilon}}_i|x\right]\end{align}$$

Then (3.4) may be summarized as the identity

(3.5)

$$\begin{align}g\left(\beta, \Sigma |x\right)\equiv \overline{g}\left(\beta, \overline{\Sigma}|x\right)\end{align}$$

This implies that the equations $\overline{g}\left(\beta, \overline{\Sigma}|x\right)=0$ and $g\left(\beta, \Sigma |x\right)=0$ have the same roots $\beta$ . The vector function $\overline{g}\left(\beta, \Sigma \right)$ may be interpreted as the vector of derivatives of a criterion function like a log-likelihood or the sum of squared residuals with respect to $\beta$ . The simplest criterion function with $\overline{g}\left(\beta, \Sigma \right)$ as gradient vectorFootnote ⁴ is

$$\begin{align*}\overline{S}&=\frac{1}{N}\sum \limits_{i=1}^NE\left({\overline{\varepsilon}}^{\prime}\;{\overline{\Sigma}}^{-1}\overline{\varepsilon}|X={x}_i\right)=\\ {}&=\frac{1}{N}\sum \limits_{i=1}^NE{\left[{\varepsilon}_i\left|{\varepsilon}_i\in \left({\nu}_{i,j-1}-{x}_i^{\prime}\beta, {\nu}_{i,j}-{x}_i^{\prime}\beta \right],X={x}_i\right.\right]}^{\prime }{\overline{\Sigma}}^{-1}E\left[{\varepsilon}_i\left|{\varepsilon}_i\in \left({\nu}_{i,j-1}-{x}_i^{\prime}\beta, {\nu}_{i,j}-{x}_i^{\prime}\beta \right]\right.,X={x}_i\right]\end{align*}$$

This is the sum of Squared Generalized Residuals. The identity (3.5) implies that $\overline{S}=E\left({\overline{\varepsilon}}^{\prime}\overline{\varepsilon}|X\right)$ and $S=E\left({\varepsilon}^{\prime}\varepsilon |X\right)$ have the same derivatives with respect to $\beta$ ; consequently, they are identical except for a constant. When we decompose the residual variance into the sum of between- and within-variance $E\left({\varepsilon}^{\prime}\varepsilon |X\right)=E\left({\overline{\varepsilon}}^{\prime}\;\overline{\varepsilon}|X\right)+E\left({\overset{"}{\varepsilon}}^{\prime}\;\overset{"}{\varepsilon }|X\right)$ , it is obvious that this constant difference is just the within-variance $S-\overline{S}=E\left({\overset{"}{\varepsilon}}^{\prime}\;\overset{"}{\varepsilon }|X\right)$ , which appears not to depend on $\beta$ . Things are complicated since each individual $i$ has its own observation grid ℂ _i .

The solution for $\beta$ is found as the root vector of the $K$ -equation system $\overline{g}\left(\beta, \Sigma |X\right)=0$ .

We have now to construct the sample analogue of $\overline{g}\left(\beta, \Sigma |X\right)$ . The ${\overline{\varepsilon}}_{i,{j}_i}$ have not drawn much attention in the empirical literature. One notable exception is Heckman (Reference Heckman1976) who appears to be the first author in econometric literature to recognize the importance of this expected residual, later in the econometric literature sometimes called the Heckman-term (see also Van de Ven and Van Praag (Reference Van de Ven and van Praag1981)). They have been called by Gouriéroux et al. (Reference Gouriéroux, Monfort, Renault and Trognon1987) the generalized residuals. They used them in the analysis of residuals. If the exact errors $\varepsilon$ are standard normally distributed, then we have for the coarsened errors the well-known formula

(3.6)

$$\begin{align} &{\overline{\varepsilon}}_{i,{j}_i}\overset{def}{=}E\left({\varepsilon}_i\left|{\nu}_{j_i-1}-{x}_i\beta <{\varepsilon}_i\le \right.{\nu}_{j_i}-{x}_i\beta, X\right)=\nonumber\\ &\quad=\frac{\varphi \left({\nu}_{j_i-1}-{x}_i\beta \right)-\varphi \left({\nu}_{j_i}-{x}_i\beta \right)}{\Phi \left({\nu}_{j_i}-{x}_i\beta \right)-\Phi \left({\nu}_{j_i-1}-{x}_i\beta \right)}\end{align}$$

If the errors are not normally distributed but logistically, the formulas for the truncated marginal distribution can be found for example in Johnson et al. (Reference Johnson, Kotz and Balakrishnan1994) or in Maddala (Reference Maddala1983, p. 369). We shall restrict ourselves to the assumption of normally distributed errors.

Since there are no natural units observed, we can only estimate the $\beta$ ‘s in (2.1) up to their ratios. A way to make them identifiable is to assume ${\sigma}_k=1$ for $k=1,\dots, K$ , which is the traditional assumption in Probit and item response analysis.

The sample moment analogue of (3.1) is

(3.7)

$$\begin{align}\frac{1}{N}\sum \limits_{i=1}^N\left[{x}_i{\Sigma}^{-1}{\overline{e}}_{i,{j}_i}\right]=\frac{1}{N}\sum \limits_{i=1}^N\left[{x}_i{\Sigma}^{-1}\frac{\varphi \left({\nu}_{j_i-1}-{x}_i\beta \right)-\varphi \left({\nu}_{j_i}-{x}_i\beta \right)}{\Phi \left({\nu}_{j_i}-{x}_i\beta \right)-\Phi \left({\nu}_{j_i-1}-{x}_i\beta \right)}\right]=\widehat{\overline{g}}\left(\beta \right)=0\end{align}$$

where ${j}_i$ is the index of the interval/block observed for the observation unit $i$ .

Notice that (3.7) is a concise presentation of a system of K blocks of M equations, each corresponding with one of the elements of the coefficient matrix $\beta$ , where we assume that each of the $K$ blocks contains $M$ different coefficients ${\beta}_k$ .

The cut-points $\nu$ remain to be estimated. There are $K\times \left(J-1\right)$ of them. Therefore, we derive another additional set of $\nu$ -identifying equations. The cut-points $\nu$ can be easily estimated one by one by applying the following strategy (called binarization). We define for each equation the $J-1$ auxiliary binary variables ${\overline{\varepsilon}}_{i,j}^b$ which may assume the lower value $E\left({\varepsilon}_i\left|{\varepsilon}_i\le {\nu}_j-\right.{x}_i\beta \right)$ or the upper value $E\left({\varepsilon}_i\left|{\varepsilon}_i>{\nu}_j-{x}_i\beta \right.\right)$ .Footnote ⁵ We have

(3.8)

$$\begin{align}P\left({\varepsilon}_i\le {\nu}_j-{x}_i\beta \right).E\left({\varepsilon}_i\left|{\varepsilon}_i\le {\nu}_j-{x}_i\beta \right.\right)+P\left({\varepsilon}_i>{\nu}_j-{x}_i\beta \right).E\left({\varepsilon}_i\left|{\varepsilon}_i>{\nu}_j-{x}_i\beta \right.\right)=0\end{align}$$

Again, there holds $E\left({{\overline{\varepsilon}}^b}_{i,j}\right)=0$ , due to LIE. For the sample counterparts, this implies

$$\begin{align*}\mathrm{plim}\left[\frac{1}{N}\sum {{\overline{e}}^b}_{i,j}\right]=0,j=1,2,\dots, J-1.\end{align*}$$

The sample moment analogues are

(3.8a)

$$\begin{align}\frac{1}{N}\left[\sum \limits_{i\left({j}_i\le j\right)}\frac{\varphi \left({\nu}_{k,j}-{x}_{i,k}{\beta}_k\right)}{\Phi \left({\nu}_{k,j}-{x}_{i,k}\beta \right)}-\sum \limits_{i\left({j}_i>j\right)}\frac{\varphi \left({\nu}_{k,j}-{x}_{i,k}\beta \right)}{1-\Phi \left({\nu}_{k,j}-{x}_{i,k}\beta \right)}\right]=0,k=1,\dots, K,j=1,\dots, J-1\end{align}$$

from which the cut-points ${\nu}_{k,j}$ can be easily estimated, as both sums at the left increase in ${\nu}_{k,j}$ . We notice that these observations are not yet weighted by an error-covariance matrix.

Summing up we are ending with two-equation systems (3.7) and (3.8) from which the parameters $\beta$ and $\nu$ are estimated. This can be done by applying the Generalized Method of Moments (GMM) (Hansen, Reference Hansen1982). We refer to well-known textbooks such as Cameron and Trivedi (Reference Cameron and Trivedi2005), Greene (Reference Greene2018) and Verbeek (Reference Verbeek2017) for elaborate descriptions. The software can be found, e.g., in Stata. We use an iterative calculation scheme. Starting with assuming $\beta =0$ , a first-round yields ${\beta}^{(1)},{\nu}^{(1)}$ . Taking these values as a starting point we repeat this iterative procedure until convergence, which is rather rapidly reached. The GMM method provides us with an estimate of the covariance matrix of $\widehat{\overline{\beta}},\widehat{\overline{\nu}}$ as well, using the well-known “sandwich” formula.

4 Polychoric correlations and coefficients of determination

Suppose we have two test items ${Y}_1,{Y}_2$ $\left(K=2\right)$ available by which we may, for example, examine an individual or test the effect of a specific therapy or a response to a satisfaction question. For simplicity, we assume both items are yes/no questions. Then we are of course interested to know how correlated the two test items ${Y}_1,{Y}_2$ and therewith the responses on the two items are. The latent correlation between items is known in the psychometric literature as the polychoric correlation. The literature on polychoric correlations is massive. We refer out of the host of excellent contributions to the seminal (Olsson, Reference Olsson1979) and the more recent (Liu et al., Reference Liu, Li, Yu and Moustaki2021; Moss and Grønneberg, Reference Moss and Grønneberg2023) for more analysis. The problem is clearly how to estimate correlations between latent variables ${Y}_1,{Y}_2$ , if we only have a $2\times 2$ coincidence table at our disposition. We propose the following method.

The latent variables are modeled like (2.1). The latent model is

(4.1)

$$\begin{align}{Y}_{i,k}={X}_{i,k,1}{\beta}_1+\dots +{X}_{i,k,M}{\beta}_M+{\beta}_0+{\varepsilon}_{i,k}\kern0.48em i=1,\dots, N,k=1,\dots, K\end{align}$$

where in this case $k=1,2$ . In this case, we have

(4.2)

$$\begin{align}\mathrm{Cov}\left({Y}_1,{Y}_2\right)={\beta}_1^{\prime }E\left({X}_1{X}_2^{\prime}\right){\beta}_2+{\sigma}_{1,2}\left(\varepsilon \right)\end{align}$$

and more generally for a $K\times K$ coincidence table we find the $K\times K$ -covariance matrix

(4.3)

$$\begin{align}\mathrm{Cov}(Y)={B}^{\prime }{\Sigma}_{XX}B+{\Sigma}_{\varepsilon \varepsilon}\end{align}$$

where ${B}^{\prime }$ stands for the $K\times M$ matrix of structural effects and ${\Sigma}_{\varepsilon \varepsilon}$ for the latent error covariance matrix. Now, we derive the polychoric correlation from $\mathrm{Cov}(Y)$ in the usual way, that is, $\rho \left({Y}_1{Y}_2\right)=\mathrm{Cov}\left({Y}_1{Y}_2\right)/\sqrt{\sigma \left({Y}_1\right).\sigma \left({Y}_2\right)}$ . The covariance matrix ${\Sigma}_{YY}=\mathrm{Cov}(Y)$ is estimated as

(4.4)

$$\begin{align}{\widehat{\Sigma}}_{YY}={\widehat{B}}^{\prime }{\widehat{\Sigma}}_{XX}\widehat{B}+{\widehat{\Sigma}}_{\varepsilon \varepsilon}\end{align}$$

where $\widehat{B}$ is the estimated matrix of regression coefficients, ${\widehat{\Sigma}}_{XX}=\frac{1}{N}\sum \nolimits_{i=1}^N{X}_i{X}_i^{\prime }$ , and ${\widehat{\Sigma}}_{\varepsilon \varepsilon}$ the estimated full error-covariance matrix, as estimated in the Appendix.

We notice that in the case that there are no structural effects found, i.e., $B=o$ , we still may have non-zero polychoric correlations due to correlated errors. The corresponding correlations are found from the covariance matrix ${\widehat{\Sigma}}_{YY}$ in the usual way.

The matrices $B$ , ${\Sigma}_{XX}$ are already consistently estimated. The latent (full) error-covariance matrix ${\Sigma}_{\varepsilon \varepsilon}$ is yet unknown. In the Appendix, we demonstrate how ${\Sigma}_{\varepsilon \varepsilon}$ is consistently estimated.

We note that this method does not assume the normality of the random vectors $X$ or $\varepsilon$ . We may also assume, for example, $\varepsilon$ to be logistic. In those cases the formula (3.3) for the generalized residual has to be replaced by the corresponding formula for the logistic, or in fact, any distribution, provided that the covariance matrix is finite.

In the second application below, where we are estimating satisfaction, we present the estimated $8\times 8$ polyserial correlations between satisfactions as the off-diagonal elements in Tables 5–7 and 8a–8e. For the first application in Section 5 we might estimate the polyserial correlations as well but given the panel nature of the data, it is not very interesting.

The relative explanatory power of the equation estimates depends on the question of how volatile the outcomes are due to random errors. Consider (5.1). We have

(4.5)

$$\begin{align}\mathrm{Var}\left({Y}_1\right)={\beta}_1^{\prime }E\left({X}_1{X}_1^{\prime}\right){\beta}_1+{\sigma}_{1,1}\left(\varepsilon \right)\end{align}$$

An attractive measure of fit, that is explanatory power is the traditional coefficient of determination

(4.6)

$$\begin{align}{R}^2=\frac{\beta_1^{\prime }E\left({X}_1{X}_1^{\prime}\right){\beta}_1}{\beta_1^{\prime }E\left({X}_1{X}_1^{\prime}\right){\beta}_1+{\sigma}_{1,1}\left(\varepsilon \right)}=1-\frac{1}{\beta_1^{\prime }E\left({X}_1{X}_1^{\prime}\right){\beta}_1+{\sigma}_{1,1}\left(\varepsilon \right)}\end{align}$$

The sample analogue for the first equation is

(4.7)

$$\begin{align}{\widehat{R}}_k^2=\frac{\frac{1}{N}\sum \limits_{i=1}^N{\left[\sum \limits_{m=1}^M{\widehat{\beta}}_{k,m}{x}_{i,k,m}\right]}^2}{\frac{1}{N}\sum \limits_{i=1}^N{\left[\sum \limits_{m=1}^M{\widehat{\beta}}_{k,m}{x}_{i,k,m}\right]}^2+1}\end{align}$$

where ${\sigma}_{1,1}\left(\varepsilon \right)=1$ , as postulated in Section 3. This is the same magnitude as proposed by McKelvey and Zavoina (Reference McKelvey and Zavoina1975). We notice that these numbers may be interpreted as coefficients of determination of the regression equations (5.1) for $k=1,k=2,\dots, k=5$ , respectively. Of course, the regression is not performed as ${Y}_{i,k}$ is not observable per individual. However, the regression correlation coefficient can be estimated by a detour using (4.6) and (4.7). We call these satisfaction correlation coefficients. They measure the part of the satisfaction variation, which can be structurally explained by observable traits $X$ . If there are $K$ equations, we get ${\widehat{R}}_1^2,\dots, {\widehat{R}}_K^2$ . For curiosity we present in Table 2 our ${R}^2$ and the McFadden (Reference McFadden1974) ${R}^2$ for Ordered Probit side- by -side.

5 Two empirical examples and one simulation experiment

In order to evaluate our method empirically we considered two data sets, both part of a 2009–2013 panel data-sequence from the German SOEP-panel data set and a block of eight satisfaction questions in wave 2013 of the SOEP data. The model in Section 5.1 is a set of five time-panel Ordered Probit equations where the errors are correlated. We call this estimation method Feasible Multivariate Ordered Probit (FMOP). It can be generalized to an arbitrary number of panel waves. The second data set consists of eight seemingly unrelated cross-section satisfaction questions, where errors are correlated. It is estimated in Section 5.2. We call this a Seemingly Unrelated Ordered Probit model (SUOP). In addition we present estimations on a simulated data set on request of one of the referees to this paper. The program code may be requested from the first author.

Given the results of the method, it becomes possible to estimate the latent full covariance matrix $\Sigma$ as well. We defer the description of how to estimate the full latent covariance matrix to the Appendix.

5.1 Employment status evaluation on a German five-year panel data set

Now we apply the FMOP method to a specific data set. We choose the employment situation of German workers, where we do not pretend to make a study of German employment but merely test the feasibility of the method, using these employment data. Following the lines above, we try to estimate the employment equation and the error covariance matrix using FMOP.

The data are derived from the German, Socio-Economic Panel (SOEP) data set. Households are followed for a period of five successive years (2009–2013). We assume an unstructured error covariance matrix. All explanatory variables are measured as deviations from their averages.

We use the variable employment (variable e11103 in the German SOEP data set) in three self-reported categories “not working,” “part-time working,” and “full-time working.” This implies that the five grids for the years 2009–2013 consist of three intervals each. We assume the explanatory variables “age (18–75 years of age),” “age-squared,” dummy variables for “gender (female = 1),” “marital status: married (reference),” “marital status: single,” “marital status: separated,” “ln(household income minus individual labor income),” “number of children at home,” “years of education,” and dummy variables for “years of education unknown” and “living in East-Germany.” The latent variable is assumed to be generated by the linear equation

(5.1)

$$\begin{align}\mathrm{Employment}&={\beta}_1\mathrm{Age}+{\beta}_2{\mathrm{Age}}^2+{\beta}_3\mathrm{Female}+{\beta}_4\mathrm{Single}+{\beta}_5\mathrm{Separated}+\nonumber\\ &\quad +{\beta}_6\mathrm{Ln}\left(\mathrm{HH}.\mathrm{LabourInc}\right)+{\beta}_7\mathrm{Children}+{\beta}_8\mathrm{Years}\_\mathrm{educ}+\nonumber\\ &\quad +{\beta}_9\mathrm{Years}\_\mathrm{educ}\_\mathrm{unknown}+{\beta}_{10}\mathrm{East}+\varepsilon \end{align}$$

As already said, we assume an “unstructured” $5\times 5$ covariance matrix where $\overline{\Sigma}=\left({\overline{\rho}}_{k,{k}^{\prime }}\right)$ . The results are presented in Table 1. In the left-hand panel, we show the in-between error covariance matrix $\widehat{\overline{\Sigma}}$ , in the right-hand panel the latent full covariance matrix $\widehat{\Sigma}$ as estimated by the simulation method described in the Appendix. The error correlation over time appears from the right panel to be quite considerable (1.0, 0.8775, 0.7800,…). When we look at the coarsened data the correlation is mitigated by the coarsened observation but still considerable.

Table 1 In-between and full error covariance matrices for FMOP.

The regression estimates according to FMOP and Ordered Probit (errors independent) are presented in Table 2.

Table 2 Regression estimates from FMOP and Ordered Probit.

As expected, the regression estimates are of the same order, because both estimators are consistent. The difference is clearly in the calculated standard deviations. All FMOP standard errors are a factor of 1,5 to 2,0 larger than the OP estimates. This is caused because the assumption of error independence by OP instead of the observed strong error correlations is tantamount to a gross exaggeration of the reliability of the data material when we ignore the non-zero error correlations. The difference in standard deviations is a warning signal.

For curiosity we also look at the question of what standard deviations we would have found when we would have had the non-coarsened, that is exact, data material at our disposal. Those standard deviations are estimated by the roots of the diagonal elements of $\frac{1}{N}{\left({X}^{\prime }{\widehat{\Sigma}}^{-1}X\right)}^{-1}$ the elements of which are known. The latent error-covariance matrix $\Sigma$ is estimated by $\widehat{\Sigma}$ according to the method described in the Appendix. We see from comparison that the FMOP-standard deviations (e.g., for the AGE-coefficient 0.0059) on the basis of the coarsened observations are much larger than the corresponding values found from Ordered Probit theoretical values (0.0032) or for GLS-estimation (0.0038) on the exact latent data. Next to our estimate of we also present the Probit ${R}^2$ as defined by McFadden (Reference McFadden1974). We see that the latter is considerably smaller than ‘our’ ${R}^2$ , which follows that of McKenzie-Zavoina (Reference McKelvey and Zavoina1975).

The computation time in total was 8 seconds. We used a laptop. The computation process can be split up into two parts: the first-stage OP estimation, taking 3 seconds and the second-stage estimation taking another 5 seconds.

We see that employment increases with age until age 42, after which employment decreases (we excluded respondents under 18 years of age and those over 75 years of age). Females are less often employed than males. In households with children, the respondents work less than in childless households. The more additional labor income in the household, the more the respondent works. The more education years one has, the more one works full-time, while respondents from East Germany are less employed than the West Germans.

5.2 Seemingly Unrelated Ordered Probit (SUOP) on a block of eight satisfaction questions

In the German panel questionnaire, we find a number of satisfaction questions referring to various life domains, like those presented in Figure 1. Here, we apply the SUOP method.

This type of questioning is abundantly used in marketing research and happiness research. Another very important instance, where the use of SUOP is at hand, is in the analysis of vignettes, also known as factorial surveys in sociological research or as conjoint analysis, now one of the major tools in psychology and marketing research (Green & Srinivasan, Reference Green and Srinivasan1978; Atzmüller & Steiner Reference Atzmüller and Steiner2010; Wallander, Reference Wallander2009; Van Beek et al., Reference Van Beek, Koopmans and van Praag1997).

Fig. 1 A block of satisfaction questions with respect to various life domains.

The data set consists of about 15,000 observation units. Since the original formulation with 11 answer categories made the coarsened observations look very similar to continuous observations, we further coarsened the data into five response categories (0,1,2), (3,4),…,(9,10). In this paper, we apply SUOP analysis to the above-listed block of satisfaction questions with respect to life domains from the 2013 wave of the GSOEP panel. We use the following explanatory variables: age and age-squared, dummies for being female, single and separated, ln(individual labor income), ln(household income minus individual labor income), the number of children, the number of years of education, living in East Germany, dummy disability status (0 (no), 1 (yes)), and health rating (1 (bad health),…,5 (very good health)). Our primary objective is to demonstrate the feasibility of SUOP. It stands to reason that for a substantive analysis of domain satisfactions, this model specification is probably too simplistic, however, for our objective, testing the feasibility of SUOP, this specific choice is no problem. In order to avoid that every dependent variable would be explained by the same set of explanatory variables we chose different subsets for each equation.

In Table 3 we present the estimates of the first two equations on Health and Sleep satisfaction. For the full table presenting all eight equation estimates we refer to the Appendix.

Table 3 Comparison of the parameter estimates and their s.e.’s for Ordered Probit, Method of Moments, and Maximum Likelihood.

In the first two columns we present the initial Probit estimates and their s.e.’s. In columns 3, 4 we present the corresponding SUOP-estimates and their s.e’s. The two right-hand columns 5, 6 give the corresponding estimates by means of the ML-method. We take the cmp results as the touchstone of our comparison.

Our first conclusion is that the three methods OP, SUOP, ML yield estimates which do not differ significantly in most cases. This is not surprising as the three estimators are consistent. The standard deviations of the SUOP-estimators seem to be slightly larger than the ML-estimators, but the differences are mostly negligible.

In Table 4 we present the full correlation matrices as estimated by SUOP (estimation according to Appendix) and ML (according to Stata), respectively.

Table 4 Full error correlation matrices compared for SUOP and ML.

We see that of the 75 SUOP-estimated regression coefficients 17 fall out of the ML-confidence intervals. For the estimates of the correlation matrix we find a similar result.Footnote ⁶ Three of the 28 SUOP estimated correlation coefficients are just outside the ML-confidence intervals.

The polychoric correlation matrix is presented in Table 5.

Table 5 The polychoric correlation matrix.

A naïve approach is to assign the values 0,1,…,9,10 to the satisfaction values and to calculate the Pearson correlations on that basis. This assignment is conform to daily usage, where average satisfaction values in a sample are also based on this assignment practice.

The Pearson correlations are presented in Table 6. We see that there is a considerable difference between Tables 5 and 6. We prefer 5 to 6, as the cardinalization by 0,1..,10 is arbitrary and might be replaced by another one (0,2,3,…) yielding a different Table 6, while the polyserial correlations are based on endogenous cardinalization. We notice that all Pearson correlations in Table 6 are considerably smaller than the corresponding numbers in Table 5.

Table 6 The Pearson correlation matrix. (scale 0–10).

The computation process can be split up into three parts: the first-stage OP estimation took 1.8 seconds on our ASUS VivoBook 15 laptop, and the second-stage SUOP estimation took another 111 seconds. The whole calculation requires less than two minutes. The ML estimations by cmp (default) took 7 hours. In the SUOP method, we use the in-between covariance matrix $\widehat{\overline{\Sigma}}$ and not the full covariance estimate. The computation time depends on the sample size $N$ , the size $K$ of the error covariance matrix, and the capacity of our laptop. In this example $K$ = 8. We see that for the ML method, the time increases non-linearly with $N$ . The number $K$ seems to be important as well. For $K$ = 2 equations both methods are roughly equally fast, where SUOP takes 11 seconds and ML-cmp 10.5 seconds for $N$ = 15535. For $K$ = 3 the SUOP computation increases to 16 seconds, while the ML method requires already 1,241 seconds. This is caused, it seems, by the fact that ML has to evaluate a lot of $K$ -dimensional integrals. A colleague of ours (an expert Stata user) observed, quoting the “options” in the Stata text, that cmp uses the GHK-simulation method for evaluating the needed integrals and that in the default option cmp uses $2\sqrt{N}$ draws per evaluated likelihood. In the present case, this is about 250 simulations per observation. Cappellari and Jenkins (Reference Cappellari and Jenkins2003) suggested that for a large number of observations, the number of draws can be considerably reduced without severe efficiency loss. According to our colleague by taking 5 draws per likelihood we would reduce the computation time from the reported 7 hours to 8 minutes with only a slight efficiency reduction. That is probably still significantly slower than the new method, but the revision would be material. We followed this suggestion and found indeed comparable estimates for the coefficients $\beta$ . To our surprise the standard deviations for the five draws were not significantly different from the 250 draws version. This seems to indicate that in the assessment of variance the additional contribution caused by the simulation variance is not taken into account.

Clearly, if we would reduce the number of equations from eight to a more manageable four or two equations and/or reduce the number of observations, both the ML and the SUOP methods would perform much faster.

Our conclusion is that the SUOP method is faster than the ML method. We are unable to say whether the Stata procedure cmp is to blame and could be improved or whether this is a general feature of the ML-GHK procedure. It might also be that we could have reduced the ML computation time by choosing specific options instead of the default procedure. Choosing too severe tolerance levels for the iterations involved would have increased the computation time in exchange for more exact confidence intervals. However, given that the cmp outcomes have about the same confidence intervals as our SUOP outcomes we do not believe that the tolerance levels chosen in cmp were more severe than in our method (see Table 8f).

5.3 A simulated example

Finally, we apply the estimation method to a simulated data set. We simulated a hard data set of 10,000 observations. We generated the set as follows. We assumed a latent model

$$\begin{align*}{y}_{i,1}&={x}_{i,1}+\kern0.6em {x}_{i,2}+\kern0.36em {x}_{i,3}+{x}_{i,4}+{\varepsilon}_{i,1}\\ {}{y}_{i,2}&={x}_{i,1}+2{x}_{i,2}+3{x}_{i,3}+4{x}_{i,4}+{\varepsilon}_{i,2}\\ {}{y}_{i,3}&=-{x}_{i,1}+2{x}_{i,2}-3{x}_{i,3}+4{x}_{i,4}+{\varepsilon}_{i,3}\\ {}{y}_{i,4}&=-{x}_{i,1}+0.5{x}_{i,2}-\kern0.36em {x}_{i,3}+{x}_{i,4}+{\varepsilon}_{i,4},\end{align*}$$

where, ${x}_1$ is normal $N\left(0,1\right)$ , where ${x}_2=0.5{x}_1+D1$ with D1 a dummy variable equal to +1 or −1 with 50% each, where ${x}_3=N\left(0,2\right)+0.5{x}_2$ , and where ${x}_4=-0.5{x}_3+D2$ and D2 is drawn to equal +1, 0, or −1 with a chance of each.

The error vector $\varepsilon$ is i.i.d. $\mathrm{N}\left(0,\Sigma \right)$ with

$$\begin{align*}\Sigma =\left(\begin{array}{l}\;1.0\\ {}\;0.5\kern0.24em 1.0\\ {}\hbox{-} 0.5\kern0.24em 0.3\kern0.24em 1.0\\ {}\;0.2\kern0.24em 0.6\hbox{-} 0.1\kern0.24em 1.0\end{array}\right).\end{align*}$$

We notice that all four variables $x$ and the error vector has an expectation equal to zero. In order to avoid that (the non-conditioned) ${Y}_i$ is approximately normal, we restricted the explanatory variables to a small number of four and we chose those variables to be non-normal and correlated, such that the structural part ${\beta}^{\prime }x$ does not tend to normality. Our first aim is to look for the distribution of the exact data. The expectation $E(Y)=0$ , the empirical mean equals 0.0160 and the variance var(Y) equals 2.603. The correlation matrix of the variables $X$ , and $Y$ is the 8 × 8 matrix in Table 1.

Cut points are defined as ${\displaystyle \begin{array}{l}{\nu}_{1,1}=0\\ {}{\nu}_{2,1}=-1,\kern0.48em {\nu}_{2,2}=1.5\\ {}{\nu}_{3,1}=-1,\kern0.48em {\nu}_{3,2}=0.5\\ {}{\nu}_{4,1}=-1.5,{\nu}_{4,2}=-0.5,{\nu}_{4,3}=1\end{array}}$

We define the response indicators: ${\displaystyle \begin{array}{l}{j}_{i,1}=1,2\\ {}{j}_{i,2}=1,2,3\\ {}{j}_{i,3}=1,2,3\\ {}{j}_{i,4}=1,2,3,4\end{array}}$

The model is iteratively estimated by the FMOP method. ${j}_{i,k}$ is the interval index by respondent i for equation k, corresponding with the four equations k = 1,…,4. We start with iteration t = 0 for $\beta =0$ . We define the under- and upper residuals

$$\begin{align*}E\left(\varepsilon \left|\varepsilon \le \right.{\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)={\underset{\sim }{\overline{e}}}_{j_{i,k}}^{(t)}=\frac{-\varphi \left({\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)}{\Phi \left({\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)}\\ {}E\left(\varepsilon \left|\varepsilon >\right.{\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)={\tilde{\overline{e}}}_{j_{i,k}}^{(t)}=\frac{\varphi \left({\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)}{1-\Phi \left({\nu}_{j_{i,k}}^{(t)}-{\beta_k^{(t)}}^{\prime }{x}_{i,k}\right)}\end{align*}$$

We define the sets of respondents ${S}_{k,j}^1,{S}_{k,j}^2$ (k = 1,…4; j = 1,…, ${J}_k$ ) who are in the response categories $\le j$ or $>j$ respectively. We solve the equations

(5c.1)

$$\begin{align}\sum \limits_{i\in {S}_{k,j}^1}{\underset{\sim }{\overline{e}}}_{j_{i,k}}^{(t)}+\sum \limits_{i\in {S}_{k,j}^2}{\tilde{\overline{e}}}_{j_{i,k}}^{(t)}=0,k=1,\dots, 4,j=1,\dots, {J}_k\\[-24pt]\nonumber\end{align}$$

for ${\nu}_{k,j}^{(t)}$ and find estimated cut-points ${\nu}_{k,j}^{(t)}$ in the t ^th iteration. These cut-points ${\nu}_{k,j}^{(t)}$ are substituted to define the generalized residuals. The estimated generalized residuals ${\overline{e}}_{j_{i,k}}^{(t)}$ in the ${t}^{th}$ iteration are

$$\begin{align*}{\overline{e}}_{j_{i,k}}^{(t)}=\frac{\varphi \left({\nu}_{j_{i,k}-1}^{(t)}-{\beta_k}^{\prime }{x}_{i,k}\right)-\varphi \left({\nu}_{j_{ik}}^{(t)}-{\beta_k}^{\prime }{x}_{i,k}\right)}{\Phi \left({\nu}_{j_{i,k}-1}^{(t)}-{\beta_k}^{\prime }{x}_{i,k}\right)-\Phi \left({\nu}_{j_{i,k}-1}^{(t)}-{\beta_k}^{\prime }{x}_{i,k}\right)},\\[-20pt]\end{align*}$$

where ${j}_{i,k}$ is the interval index by respondent i for equation k. corresponding with the four equations k = 1,…,4. We now define the $\left(K\times M\right)$ orthogonality conditions

(5c.2)

$$\begin{align}\frac{1}{N}\sum \limits_{i=1}^N{x}_{i,k,m}{\overline{e}}_{j_{i,k}}^{(t)}=0,k=1,\dots, 4,m=1,\dots, M\end{align}$$

We have now two equation systems (5c.1) and (5c.2), which are simultaneously solved. Then we calculate the in-between error covariance matrix

$$\begin{align*}{\overline{\Sigma}}^{(t)}=\left({\overline{\sigma}}_{k,{k}^{\prime }}\right)=\frac{1}{N}\left(\sum \limits_{i=1}^N{\overline{e}}_{j_{i,k}}^{(t)}{\overline{e}}_{j_{i,{k}^{\prime}}}^{(t)}\right).\\[-24pt]\end{align*}$$

We repeat (5c.1) and (5c.2) with the new ${\beta}^{\left(t+1\right)}$ and ${\nu}_{k,j}^{\left(t+1\right)}$ , and find new estimates. We repeat (5c.1) and (5c.2) after weighting with the inverse covariance matrix solving

$$\begin{align*}\sum \limits_{i=1}^N{\overline{e}}_{j_{i,k}}^{(t)}{\left[{\overline{\Sigma}}^{(t)}\right]}^{-1}{x}_{i,k}=0.\end{align*}$$

In the end we estimate the corresponding covariance matrix of the estimators $\widehat{\overline{\beta}},\widehat{\overline{\nu}}$ by the well-known sandwich formula.

We estimate each non-diagonal element ${\sigma}_{k,{k}^{\prime }}$ of the latent full covariance matrix from the corresponding element ${\overline{\sigma}}_{k,{k}^{\prime }}$ of the in-between error covariance matrix. The method is described in detail in the Appendix.

We conclude that the method is stable in the number N of observations and it does not differ significantly from the cmp-estimates.

Table 7 Beta’s, Standard errors and Error correlations (N = 10,000).

Table 8a Beta’s, Standard errors and Error correlations (N = 5,000) (Base dataset of 10000 cases, only every second case is used).

0.3014: 95% confidence interval cmp [0.1462: 0.2916]

Table 8b Beta’s, Standard errors and Error correlations (N = 2,000) (Base dataset of 10000 cases, only every fifth case is used).

Table 8c Beta’s, Standard errors and Error correlations (N = 1,000) (Base dataset of 10000 cases, only every tenth case is used).

0.7347: 95% confidence interval cmp [0.3862: 0.6583]

Table 8d Beta’s, Standard errors and Error correlations (N = 1,000) (A newly created dataset of 1000 cases).

0.5582: 95% confidence interval cmp [0.1834: 0.4989] 0.4227: 95% confidence interval cmp [−0.1869: 0.4199]

Table 8e Beta’s, Standard errors and Error correlations (N = 1,000) (Again a newly created dataset of 1000 cases).

(0.5655: 95% confidence interval cmp [−0.1068: 0.5155)

Table 8f The correlation matrix of the variables (N = 10,000).

6 Concluding remarks

In this paper, we suggest a new approach to the statistical analysis of ordinal data, where the errors are supposed to be correlated. The basic idea is that the ordinally observed dependent variables reflect latent continuously-valued random variables $Y$ and that the observations are coarsened corresponding to an interval grid $\mathrm{\mathbb{C}}={\left\{\left({\nu}_{j-1},{\nu}_j\right]\right\}}_{j=1}^J$ of $Y$ on the real axis, where the unknown cut-points $\nu$ have to be estimated as well. Each observed category corresponds to one interval on the real axis. For cases where $Y$ is more dimensional, say K, and errors are correlated there is mostly a formidable impediment. The usual ML-estimation procedure requires to evaluate likelihoods, which are multivariate normal integrals over $K$ -dimensional blocks. If this has to be performed this is very cumbersome and time -consuming. In this paper we show that estimation of the latent generating model behind the coarsened observations of dependent variables can be done in a much simpler way than usual without the need for multi-dimensional integration or large-scale simulations.

In our approach, we depart from the requirement that the difference between the observation $Y$ and its predictor ${\beta}^{\prime }X$ , that is $\left(Y-{\beta}^{\prime }X\right)$ , cannot be further explained by $X$ . This is translated into the zero-covariance conditions (2.5) and gives those conditions a significance on their own. When the errors are normally distributed this coincides with the ML-conditions.

The identifying moment conditions are found by substituting the residuals in the regular zero covariance-conditions for exact data by the corresponding generalized residuals corresponding to the ordinal data.

The approach closely resembles the traditional GLS- and SUR-approaches used to estimate linear models on exactly observed dependent variables.

For this method, an assumption about the marginal distributions of the error vector is required. We choose for normality, which enables us to use (3.6). Although we restricted ourselves to assuming normal errors, it is not difficult to generalize this method for other error distributions as well, where the logistic and the lognormal are the foremost candidates (see in those cases, e.g., Maddala (Reference Maddala1983, p. 369) for the formulae of the generalized residuals). The estimation method remains unchanged.

This approach seems to smoothly close the gap between the analysis of exactly observable data and qualitative ordinal data. We saw in the above examples that the effect on variances (confidence bands and intervals) caused by SUOP compared to OP is in some cases small and in other cases large. The regression coefficients are mostly similar, which is not surprising as both estimators are consistent estimators. This is also the case for the comparison between traditional OLS and SUR estimates in traditional econometrics. The advantage lies in the possibility to account for error correlations, caused by using the additional information supplied by the error correlations. Standard error deviations are assessed without assuming a specific structure of the covariance matrix before estimation. In the panel data example in Section 5.1 it appears that the standard deviations of the estimates are doubled or more taking error correlation into account. Hence, in this case the reliability reduction when taking error correlations into account is huge.

In this paper, we focus on the qualitative versions of FMOP and SUOP of FGLS and SUR. However, this method seems generally appropriate for two broad types of model estimation situations characterized by Roodman (Reference Roodman2011) as:

“1) those in which a truly recursive data-generating process is posited and fully modeled, and

2) those in which there is simultaneity but instruments allow the construction of a recursive set of equations, as in two-stage least squares (2SLS).”

Our method may be compared with the methods, (based on the GHK algorithm), developed by Cappellari and Jenkins (Reference Cappellari and Jenkins2003), based on simulated moments (Hajivassiliou & McFadden, Reference Hajivassiliou and McFadden1998; Roodman, Reference Roodman2007, 2020). Those methods aim at getting numerical estimates of the log-likelihood by simulation and finding, by variation of the unknown parameters to be estimated, which parameter values maximize the simulated log-likelihood of the sample. This requires the repeated evaluation of multiple normal integrals and makes the procedure time-consuming. In our approach, we do not need to evaluate multiple integrals or large-scale simulations, and therefore the method is not restricted with respect to the size $K$ of the equation system. Moreover, we can handle an arbitrary number $J\left(>2\right)$ of outcome categories. We do not have to restrict ourselves to dichotomous (biprobit) data only. The method can be used for any number of equations $K$ and any number of interval categories ${J}_k$ . For instance, in our SUOP-example (Section 5.2) we estimated eight equations, 75 effects and 32 cut points simultaneously. It is obvious that direct observation is a limiting case of coarsening and consequently the methos may also be used when the data set consists of a mixture of directly observed and ordinal data. Classical least-squares based estimation methods on exactly observed data may be seen as a specific limiting case.

Our estimation method appears to require only a few minutes of computing time, which compares favorably with the traditional methods each of which requires much more time. The method may be interpreted as a generalization of classical least squares models that deal with exact observations to include the estimation of models on the basis of more-dimensional ordered probit-type observations.

In this paper, we restricted ourselves to the most straightforward OP observation mode. In a forthcoming study, we will generalize this approach to tackle the case where the sample consists of a mixture of categorical, censored, and exactly observed data.

Acknowledgments

We are grateful for helpful remarks by the referees.

Appendix

A Estimation of the latent error covariance matrix Σ

For the estimation of the latent covariance matrix starting from the estimated $\widehat{b}$ -estimates and the in-between covariance matrix $\widehat{\overline{\Sigma}}$ we propose the following method. We make use of the fact that in order to estimate the true error covariances ${\rho}_{k,{k}^{\prime }}$ by ${\widehat{\rho}}_{k,{k}^{\prime }}$ for two OP-equations we notice that for each pair $k,{k}^{\prime }$ we only have to look for the bivariate marginal distribution of ${\varepsilon}_k,{\varepsilon}_{k^{\prime }}$ . Consider for instance a cluster of three observations (1, 2, 3). The sample likelihood is $P\left({\varepsilon}_1\in {S}_1,{\varepsilon}_2\in {S}_2,{\varepsilon}_3\in {S}_3;0,{\Sigma}_{1,2,3}\right)$ , where ${S}_1=\left({\nu}_{j_i-1}-{x}_{i,1}\beta <{\varepsilon}_{i,1}\le {\nu}_{j_i}-{x}_{i,1}\beta \right]$ and ${S}_2$ and ${S}_3$ similarly defined. However, the bivariate marginal likelihood provides the same information on ${\sigma}_{1,2}$ as the trivariate full likelihood. In the $K$ -dimensional covariance matrix there are $K\left(K-1\right)/2$ different non-diagonal elements. For instance, if $K$ = 8 as in the model in Section 6, this boils down to 28 simple two-dimensional estimations.

Applying the Gram-Schmidt decomposition, there holds for the pair ${\varepsilon}_k,{\varepsilon}_{k^{\prime }}$ .

(A.1)

$$\begin{align}{\varepsilon}_{i,{k}^{\prime }}={\rho}_{k,{k}^{\prime }}.{\varepsilon}_{i,k}+{\eta}_{i,k}\sqrt{1-{\rho}_{k,{k}^{\prime}}^2}\end{align}$$

where ${\varepsilon}_{i,k},{\eta}_{i,k}$ are independent $N\left(0,1\right)$ drawings.Footnote ⁷ The coefficient $\sqrt{1-{\rho}_{k,{k}^{\prime}}^2}$ guarantees that ${\sigma}^2\left({\varepsilon}_{i,{k}^{\prime }}\right)=1$ . We have $E\left({\varepsilon}_{i,k}.{\varepsilon}_{i,{k}^{\prime }}\right)={\rho}_{k,{k}^{\prime }}$ . When we consider the estimated in-between error covariance ${\overline{\rho}}_{k.{k}^{\prime }}=E\left({\overline{\varepsilon}}_{i,k},{\overline{\varepsilon}}_{i,{k}^{\prime }}\right)$ , then its value depends only on the two one-dimensional grids ℂ _ik and ℂ _i,k’ and ${\rho}_{k,{k}^{\prime }}$ . We may now simulate ${\varepsilon}_{i,k},{\varepsilon}_{i,{k}^{\prime }}$ for various values of ${\rho}_{k,{k}^{\prime }}$ by means of (A.1), and calculate the corresponding in-between covariances ${\overline{\rho}}_{k,{k}^{\prime }}=\frac{1}{N}\sum \nolimits_{i=1}^N{\overline{e}}_{i,k}.{\overline{e}}_{i,{k}^{\prime }}$ , corresponding to the grids ℂ _ik and ℂ _i,k’ . We notice that the dependent variable vector ${Y}_{i,k}$ is observed according to a uniform $i$ -independent gridℂ _k $={\left\{\left({\nu}_{j-1}^{(k)},{\nu}_j^{(k)}\right]\right\}}_{j=1}^{J_k}={\left\{{\mathtt{S}}_j^{(k)}\right\}}_{j=1}^{J_k}$ , while the errors ${\varepsilon}_{i,k}$ are observed according to $i$ -dependent individual grids

$$\begin{align*}{\mathrm{X}}_{ik}={\left\{\left({\nu}_{j-1}^{(k)}-{x}_{i,k}\beta, {\nu}_j^{(k)}-{x}_{i,k}\beta \right]\right\}}_{j=1}^{J_k}={\left\{{\mathtt{S}}_j^{(k)}-{x}_{i,k}\beta \right\}}_{j=1}^{J_k}.\end{align*}$$

We consider the set of all two-dimensional grids for all observation units. {ℂ _ik , ℂ _i,k’ } ${}_{i=1}^N$ = ℂ _k,k’

We write for the sample in-between covariance

$$\begin{align*}{\overline{\rho}}_{k,{k}^{\prime }}=\frac{1}{N}\sum \limits_{i=1}^N{\overline{e}}_{i,k}.\left({\rho}_{k,{k}^{\prime }}\right).{\overline{e}}_{i,{k}^{\prime }}.\left({\rho}_{k,{k}^{\prime }}\right)\overset{def}{=}f\left({\rho}_{k,{k}^{\prime }}\right).\end{align*}$$

We estimate the value of the latent ${\rho}_{k,{k}^{\prime }}$ by comparing the observed sample in-between covariance ${\widehat{\overline{\rho}}}_{k,{k}^{\prime }}$ with its simulated counterpart ${\overline{\rho}}_{k,{k}^{\prime }}$ for various values of the latent ${\rho}_{k,{k}^{\prime }}$ . It appears that there is one value ${\widehat{\rho}}_{k,{k}^{\prime }}$ solving $f\left({\rho}_{k,{k}^{\prime }}\right)={\widehat{\overline{\rho}}}_{k,{k}^{\prime }}$ . In order to gain insight into the relationship between the latent ${\rho}_{k,{k}^{\prime }}$ and the corresponding in-between covariance ${\overline{\rho}}_{k,{k}^{\prime }}$ we did some simulation experiments for three different two-dimensional grids. In Table A1 we present three different two-dimensional grids with different ${\overline{\rho}}_{k,{k}^{\prime }}$ values for different values of ${\rho}_{k,{k}^{\prime }}$ . We describe one example in detail. Consider the two-dimensional grid with ℂ ₁ $=\left\{\right(-\infty, 0\left],\left(0,\infty \right)\right\}$ , ℂ ₂ $=\left\{\right(-\infty, 0\left],\left(0,\infty \right)\right\}$ in the middle part of Table A1. We simulate a sample from a two-dimensional normal distribution with a correlation with $\rho$ = 0.1. We find an in-between covariance $\overline{\rho}$ = 0.043. For $\rho$ = 0.2 we find a corresponding value of $\overline{\rho}$ = 0.087, and so on. In Table A1 we present the relationship between $\rho$ and $\overline{\rho}$ for three different two-dimensional grids. We found that the function $f\left({\rho}_{k,{k}^{\prime }}\right)$ = ${\overline{\rho}}_{k,{k}^{\prime }}$ is monotonically increasing in ${\rho}_{k,{k}^{\prime }}$ for all grids we tested. See also Aitkin(Reference Aitkin1964). We conclude that for a given grid (ℂ ₁ ,ℂ ₂ ) the function $f\left(\rho \right)$ = $\overline{\rho}$ is monotonically increasing in $\rho$ .

Table A1 Relation between covariance $\rho$ and in-between covariance $\overline{\rho}$ .

The solution ${\widehat{\rho}}_{k,{k}^{\prime }}$ of $f\left({\rho}_{k,{k}^{\prime }}\right)={\widehat{\overline{\rho}}}_{k,{k}^{\prime }}$ is, using Slutsky’s Law, a consistent estimator ${\widehat{\rho}}_{k,{k}^{\prime }}$ of the population parameter ${\rho}_{k,{k}^{\prime }}$ . Doing this for each non-diagonal element of $\Sigma$ , we estimate the underlying non-diagonal elements of the full correlation matrix $\Sigma$ . The diagonal elements are equal to one by assumption. This yields a consistent estimator ${\widehat{\Sigma}}_{\varepsilon }$ of the error covariance matrix ${\Sigma}_{\varepsilon }$ . Confidence intervals can be found by the delta method. Fig. 1 shows the graph of the function $f\left({\rho}_{k,{k}^{\prime }}\right)$ for the left grid in Table A1.

Figure. A1 $\overline{\rho}$ as a function of $\rho$ .

In Table 4 we presented the full correlation matrices calculated by SUOP and by cmp. In Table A2. below we present the in-between covariance matrix and the full covariance matrix for the five-year panel considered in Section 5.1. We notice the fact that the grid-wise observation causes a considerable loss of information. About 60% of the information is lost. The estimates are not based on structural assumptions like “random effects” or errors where the correlations are specific functions of the time difference. Actually, this result could be used to estimate and test specific functional specifications of the covariance matrix. From Table A2. it is already obvious that “random effects” are to be rejected for this panel structure because in that case, the non-diagonal elements would have to be roughly equal to each other.

Table A2 In-between and full covariance matrices for the five-year panel.

B Table B1 is complete

Table B1 Comparison of the parameter estimates and their s.e.’s for Ordered Probit, Method of Moments, and Maximum Likelihood.

Footnotes

J.P.H. deceased on August 2, 2024.

¹ Capitals will be used for random variables, vectors, and matrices. We denote the zero-vector by o = (0,…,0). Disturbance terms will be in Greek letters. Roman letters will be used for constants and realizations of random variables. Matrices will be denoted by capitals as well to conform to the traditional regression formulas. Indexes will be suppressed where the interpretation will be clear from the context. The ML-estimator of a parameter vector $\theta$ is denoted by $\widehat{\theta}$ . The corresponding estimator from coarsened data is denoted by $\widehat{\overline{\theta}}$ . An overbar above a symbol denotes a sample average.

² The original least squares method is due to Gauss (Reference Gauss1809) and Legendre (Reference Legendre1805). “Sur la Méthode des moindres quarrés”. See, also, Stigler (Reference Stigler1981) for a historical survey.

³ The important issue of whether the coarsened sample data $\left({\overline{Y}}_i,{X}_i\right)$ contain sufficient information to identify estimators of $\beta$ and the unknown cut-points in ${S}_j$ is considered in Greene and Hensher (Reference Greene and Hensher2010) among others.

⁴ Notice that $\frac{\partial }{\partial \beta}\overline{\varepsilon}=\frac{\partial }{\partial \beta }E\left(\varepsilon \left|{\nu}_{j-1}- x\beta <\varepsilon \le {\nu}_j- x\beta, X=x\right.\right)=\frac{\partial }{\partial \beta}\left[E\left(y\left|{\nu}_{j-1}<y\le {\nu}_j\right.\right)- x\beta, X=x\right]=x$ .

⁵ This trick, called “binarization” is suggested by, e.g., Chris Muris, Reference Muris2017. “Estimation in the Fixed-Effects Ordered Logit Model,” The Review of Economics and Statistics, vol. 99(3), pages 465–477, July. The term has been used before in computer science.

⁶ The cmp-procedure provides confidence intervals for the correlation estimates.

⁷ We use standard normal draws for ${\varepsilon}_{i,k},{\eta}_{i,{k}^{\prime }}$ , because in (A.1) the simulated errors have to be $N\left(0,1\right)$ .

References

Aitkin, M. A. (1964). Correlation in a singly truncated bivariate normal distribution. Psychometrika, 29, 263–270.Google Scholar

Atzmüller, C., & Steiner, P.M. (2010). Experimental vignette studies in survey research. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 6, 128–138.Google Scholar

Amemiya, T. (1985). Advanced econometrics. Harvard university Press.Google Scholar

Bliss, C. L. (1934). The method of probits. Science, 79, 38–39.Google Scholar

Bollen, K. A. (1989). Structural equations with latent variables. Wiley & Son.Google Scholar

Cameron, A. C., & Trivedi, P. K. (2005). Micro econometrics, methods and applications. Cambridge University Press.Google Scholar

Cappellari, L., & Jenkins, S. P. (2003). Multivariate probit regression using simulated maximum likelihood. The Stata Journal, 3, 278–294.Google Scholar

Casella, G., & George, E. (1992). Explaining the Gibbs sampler American Statistician, 46(3), 167–174.Google Scholar

Cramer, J. S. (2004). The early origins of the logit model. Studies in History and Philosophy of Biological and Biomedical Sciences, 35, 613–626.Google Scholar

Dempster, P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society, Series B (Methodological), 39(1), 1–38.Google Scholar

Duncan, O. D. (1975). Introduction to structural equation models. New York: Academic Press.Google Scholar

Finney, D. (1947). Probit analysis. A statistical treatment of the sigmoid respond curve. Macmillan.Google Scholar

Finney, D. (1971). Probit analysis. Cambridge University Press.Google Scholar

Fok, D. (2017). Advanced individual demand models. In Leeflang, P. S. H., Wieringa, J. E., Bijmolt, T. H. A., & Pauwels, K. H. (Eds.), Advanced methods for modeling markets. Springer.Google Scholar

Gauss, (1809). Theoria Motus Corporum Coelestium in sectionibus conicis solem ambientium.Google Scholar

Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6(6), 721–741, doi: 10.1109/TPAMI.1984.4767596.Google Scholar

German SOEP-Socio-Economic Panel (DIW). https://www.eui.eu/Research/Library/ResearchGuides/Economics/Statistics/DataPortal/GSOEP Google Scholar

Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica, 57, 1317–1339.Google Scholar

Goldberger, A. S. (1968). Topics in regression analysis. MacMillan.Google Scholar

Goldberger, A. S. (1971). Econometrics and psychometrics: A survey of communalities. Psychometrika, 36, 83–107. https://doi.org/10.1007/BF02291392 Google Scholar

Goldberger, A. S. (1991). A course in econometrics. Harvard University Press.Google Scholar

Gouriéroux, C., Monfort, A., Renault, E., & Trognon, A. (1987). Generalised residuals. Journal of Econometrics, 34(1–2), 5–32.Google Scholar

Green, P., & Srinivasan, V. (1978). Conjoint analysis in consumer research: Issues and outlook. Journal of Consumer Research, 5, 103–123.Google Scholar

Greene, W. H., & Hensher, D. A. (2010). Modeling ordered choices: A primer. Cambridge University Press.Google Scholar

Greene, W. H. (2018). Econometric analysis (8th ed.). Pearson.Google Scholar

Hajivassiliou, V., & McFadden, D. (1998). The method of simulated scores for the estimation of LDV models. Econometrica, 66, 863–896.Google Scholar

Hayduk, L. (1987). Structural equation modeling with LISREL: Essentials and advances. Baltimore: Johns Hopkins University Press.Google Scholar

Hansen, L. P. (1982). Large sample properties of generalized methods of moments estimators. Econometrica, 50, 1029–1054.Google Scholar

Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. Annals of Economic and Social Measurement, 5(4), 475–492.Google Scholar

Hensher, D. A., Rose, J. M., & Greene, W. H. (2015). Applied choice analysis (2nd ed.) Cambridge University Press.Google Scholar

Hood, W. C., & Koopmans, T. C. (Eds.) (1953). Studies in econometric method. Wiley and Son)Google Scholar

Johnson, N. L., Kotz, S., & Balakrishnan, N. (1994). Continuous univariate distributions (vol. 1). New York: Wiley.Google Scholar

Jöreskog, K. G. (1973). A general method for estimating a linear structural equation-system. In Goldberger, A.S., & Duncan, O.D. (Eds.), Structural equation models in the social sciences (pp. 85–112). New York: Academic Press.Google Scholar

Keane, M. (1994). A computationally practical simulation estimator for panel data. Econometrica, 62(1), 95–116.Google Scholar

Legendre, A. M. (1805). Nouvelles methodes pour la determination des orbites des cometes. Paris: Firmin Didot.Google Scholar

Liu, D., Li, S., Yu, Y., & Moustaki, I. (2021). Assessing partial association between ordinal variables: quantification, visualization, and hypothesis testing. Journal of the American Statistical Association, 116(534), 955–968. https://doi.org/10.1080/01621459.2020.1796394 Google Scholar

Maddala, G. S. (1983). Limited dependent and qualitative variables in econometrics. Cambridge University Press.Google Scholar

Manski, C. F. (1988). Analog estimation methods in econometrics. Chapman and Hall.Google Scholar

Maris, E. (1995), Psychometric latent response models. Psychometrika, 60, 523–547.Google Scholar

McFadden, D. (1974). Conditional logit analysis of qualitative choice behavior. In P. Zarembka (Ed.), Frontiers in econometrics (pp. 105–142). Academic Press.Google Scholar

McFadden, D. (1989). A method of simulated moments for estimation of discrete response models without numerical integration. (PDF), Econometrica , 57(5), 995–1026.Google Scholar

McKelvey, R., & Zavoina, W. (1975). A statistical model for the analysis of ordinal level dependent variables. Journal of Mathematical Sociology, 4, 103–120.Google Scholar

Moss, J., & Grønneberg, S. (2023). Partial identification of latent Correlations with ordinal data. Psychometrika, 88, 241–252.Google Scholar

Mullahy, J. (2016). Estimation of multivariate probit models via bivariate probit. The Stata Journal, 16(1), 37–51.Google Scholar

Muris, C. (2017). Estimation in the fixed-effects ordered logit model. The Review of Economics and Statistics, MIT Press, 99(3), 465–477.Google Scholar

Narayanan, A. (2012). A review of eight software packages for structural equation modeling. American Statistician - AMER STATIST., 66, 129–138.Google Scholar

Olsson, U. (1979). Maximum likelihood estimation of the polyserial correlation coefficient.. Psychometrika, 44(4), 443–460.Google Scholar

Roodman, D. (2007, 2020). CMP: Stata module to implement conditional (recursive) mixed process estimator. Statistical Software Components, S456882. Boston College Department of Economics, revised 06 Dec 2020. https://econpapers.repec.org/software/bocbocode/s456882.htm Google Scholar

Roodman, D. (2011). Fitting fully observed recursive mixed-process models with cmp. The Stata Journal, 11(2), 159–206.Google Scholar

Stigler, S. M. (1981). Gauss and the invention of least squares. The Annals of Statistics, 9(3), 465–474.Google Scholar

Van Beek, K. W. H., Koopmans, C. C., & van Praag, B. M. S. (1997). Shopping at the Labour Market: A real tale of fiction. European Economic Review, 41(2), 295–317.Google Scholar

Van de Ven, W. P. M. M., & van Praag, B. M. S. (1981). The demand for deductibles in private health insurance: A probit model with sample selection. Journal of econometrics, 17(2), 229–252 Google Scholar

Verbeek, M. (2017). A guide to modern econometrics. (5th ed.) Wiley & Son.Google Scholar

Wallander, L. (2009). 25 Years of factorial surveys in sociology: A review. Social Science Research, 38, 505–520.Google Scholar

Wiberg, M., Culpepper, S., Janssen, R., González, J., & Molenaar, D. (2019). A taxonomy of item response models in psychometrika. Quantitative Psychology, (265), 13–23.Google Scholar

World Happiness Report, Oxford University , 2024 Google Scholar