Published online by Cambridge University Press: 20 May 2019
Classical generalized linear models assume that marginal effects are homogeneous in the population given the observed covariates. Researchers can never be sure a priori if that assumption is adequate. Recent literature in statistics and political science have proposed models that use Dirichlet process priors to deal with the possibility of latent heterogeneity in the covariate effects. In this paper, we extend and generalize those approaches and propose a hierarchical Dirichlet process of generalized linear models in which the latent heterogeneity can depend on context-level features. Such a model is important in comparative analyses when the data comes from different countries and the latent heterogeneity can be a function of country-level features. We provide a Gibbs sampler for the general model, a special Gibbs sampler for gaussian outcome variables, and a Hamiltonian Monte Carlo within Gibbs to handle discrete outcome variables. We demonstrate the importance of accounting for latent heterogeneity with a Monte Carlo exercise and with two applications that replicate recent scholarly work. We show how Simpson’s paradox can emerge in the empirical analysis if latent heterogeneity is ignored and how the proposed model can be used to estimate heterogeneity in the effect of covariates.
Author’s note: The author is thankful to Robert Franzese, Walter Mebane, Kevin Quinn, Long Nguyen, as well as participants of 2018 Polmeth and 2018 APSA Annual meeting for helpful comments on previous versions of this manuscript. The author also thanks the editor Jeff Gill and two anonymous reviewers for their invaluable suggestions. Replication materials are publicly available on the Political Analysis Harvard Dataverse (Ferrari 2018) as well as author’s website.
Contributing Editor: Jeff Gill