Adding Regularized Horseshoes to the Dynamics of Latent Variable Models

Garret Binding; Piotr Koc

doi:10.1017/pan.2024.30

Adding Regularized Horseshoes to the Dynamics of Latent Variable Models

Published online by Cambridge University Press: 17 January 2025

Garret Binding

and

Piotr Koc

Show author details

Garret Binding*: Affiliation:
Institute of Political Science, University of Basel, Basel, Switzerland Department of Political Science, University of Zurich, Zurich, Switzerland
Piotr Koc: Affiliation:
GESIS – Leibniz Institute for the Social Sciences, Mannheim, Germany Institute of Philosophy and Sociology of the Polish Academy of Sciences, Warsaw, Poland
*: Corresponding author: Garret Binding; Email: [email protected]

Article contents

Abstract
Introduction
Motivation
Model Specification
Results
Conclusion
Competing Interest
Funding Statement
Data Availability Statement
Footnotes
References

Rights & Permissions

Abstract

Dynamic latent variable models generally link units’ positions on a latent dimension over time via random walks. Theoretically, these trajectories are often expected to resemble a mixture of periods of stability interrupted by moments of change. In these cases, a prior distribution such as the regularized horseshoe—that allows for both stasis and change—can prove a better theoretical and empirical fit for the underlying construct than other priors. Replicating Reuning, Kenwick, and Fariss (2019), we find that the regularized horseshoe performs better than the standard normal and the Student’s t-distribution when modeling dynamic latent variable models. Overall, the use of the regularized horseshoe results in more accurate and precise estimates. More broadly, the regularized horseshoe is a promising prior for many similar applications.

Keywords

Bayesian inference item response theory latent variable models

Type: Letter
Information: Political Analysis , Volume 33 , Issue 2 , April 2025 , pp. 171 - 177

DOI: https://doi.org/10.1017/pan.2024.30 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (https://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2025. Published by Cambridge University Press on behalf of The Society for Political Methodology

1 Introduction

When modeling dynamic trends in latent variable models, researchers typically use random walks to link observations from one time-period to the next: tomorrow’s position on a latent dimension is today’s position plus a random effect (see, e.g., Caughey and Warshaw Reference Caughey and Warshaw2015; König, Marbach, and Osnabrügge Reference König, Marbach and Osnabrügge2013; Martin and Quinn Reference Martin and Quinn2002; Schnakenberg and Fariss Reference Schnakenberg and Fariss2014). Often, the expected trajectory of the political constructs modeled in such a manner (such as countries’ democracy scores) is a mixture of stasis and change: periods of stability are interrupted by moments of rupture (see also Reuning et al. Reference Reuning, Kenwick and Fariss2019, 504).

In this letter, we suggest the use of the regularized horseshoe (RHS; Piironen and Vehtari Reference Piironen and Vehtari2017) as a prior distribution for dynamic latent variable models. Stemming from the Bayesian literature on sparsity-inducing priors (Carvalho, Polson, and Scott Reference Carvalho, Polson and Scott2010; Polson and Scott Reference Polson and Scott2011), draws from the regularized horseshoe distribution are shrunk towards 0 (stasis) or allow for deviations from 0 (change). The correspondence between the prior’s properties and expectations about the construct makes the RHS an ideal candidate for these applications.

Below, we introduce the regularized horseshoe as a useful addition to the distributions often employed in political scienceFootnote ¹ and highlight its utility in the domain of latent variable models. We build on Reuning et al. (Reference Reuning, Kenwick and Fariss2019, from hereon: RKF), who assessed the performance of the standard normal and the Student’s t-distribution (the two distributions generally used in these applications) in a similar endeavor. We show that using the RHS leads to better results than using either of the other two distributions. Model specifications using the RHS produce more accurate and precise estimates, and better model fit than those using the other distributions.

2 Motivation

The standard normal distribution as well as the Student’s t-distribution are sensible starting points for the random walks of dynamic models. Both concentrate their probability around 0 (stasis), while also allowing for deviations from 0 (change). Their empirical density functions are shown in Figure 1. The key difference between the tails of these two distributions is best visible in panels (b) and (c). As the tails of the t-distribution are determined via the degrees of freedom ( $\nu $ ), their decay can be more gradual than those of a standard normal (here, we show draws from $StudentT(4, 0, 1)$ in line with RKF, where the parameters of the t-distribution are $\nu $ , location, and scale).

Figure 1 Empirical density functions.

While the two distributions differ in their tails, they are similar in their center: a gradual decline from a peak at 0 assigns a lot of probability to small deviations from the center. Within the Bayesian literature, distributions such as the horseshoe have been suggested that exhibit a pronounced peak at 0 to induce sparsity in the posterior distributions of parameters (Carvalho et al. Reference Carvalho, Polson and Scott2010; Polson and Scott Reference Polson and Scott2011). Essentially, the horseshoe prior is a continuous mixture of two components with one component (potentially) shrinking values towards 0 ( $\tau $ ), and another component allowing for large deviations from 0 ( $\lambda $ ).Footnote ²

More recently, Piironen and Vehtari Reference Piironen and Vehtari2017 proposed a regularized version of the horseshoe (RHS) to address the heavy tails of the horseshoe prior. By adding a hyperparameter $c^2$ to the horseshoe’s parameterization, the extent of deviations from 0 can be regularized. Formally, the regularized horseshoe distribution for a parameter $\gamma $ is constructed in the following manner:

$$ \begin{align*} & \gamma_{i} \sim N(0, \tau^2\tilde{\lambda}_{i}^2), \quad \tilde{\lambda}_{i}^2 = \frac{c^2\lambda_{i}^2}{c^2 + \tau^2\lambda_{i}^2}, \\ & \tau \sim StudentT^+(1, 0, 1), \\ &\lambda_{i} \sim StudentT^+(1, 0, 1) \quad \forall i = 1, \ldots, I, \end{align*} $$

where i indexes elements of the vector $\gamma $ . As discussed by Piironen and Vehtari Reference Piironen and Vehtari2017, $c^2$ is given a prior too:

$$ \begin{align*} c^2 \sim InvGamma(\alpha, \beta), \quad \alpha = \nu/2, \quad \beta = \nu s^2/2, \end{align*} $$

which corresponds to a $StudentT^+(\nu , 0, s^2)$ prior. We set $\nu = s = 1$ , so that $c^2 \sim InvGamma(0.5, 0.5)$ .Footnote ³ If $c^2$ is dropped and $\gamma _{i} \sim N(0, \tau ^2{\lambda }_{i}^2)$ , the prior corresponds to the original horseshoe.

The distribution of the RHS is also shown in Figure 1. More probability is assigned to values at or around 0, with a far steeper decline from the spike at the center than the more gradual descent of the other distributions. Simultaneously, the tails of the RHS are fatter, exhibiting similar, and then lower decay rates than those of the t-distribution (visible in panel (c)). These two attributes of this distribution—a spike at 0 combined with fat tails—make the RHS an ideal a priori candidate to model the theoretically expected mixture of stasis and change in latent constructs over time. Its use should result in a posterior distribution that is both more accurate (as it better reflects the expected trajectory) and precise (as its probability is more concentrated) than the other two distributions. Both aspects are important in applications, especially if estimates are used in subsequent models that propagate measurement uncertainty (rather than focusing on point estimates only; see Tai, Hu, and Solt Reference Tai, Hu and Solt2024).

3 Model Specification

All subsequent empirical analyses are based on the model specifications and priors described in this section (unless mentioned otherwise). In line with RKF, we use Bayesian inference implemented in Stan (Carpenter et al. Reference Carpenter2017) to approximate the posterior distribution of our model.

Following RKF, the likelihood for a unidimensional binary item response theory (IRT) model for units $i = 1, \ldots , I$ , items $k = 1, \ldots , K$ , and time-points $t = 1, \ldots , T$ is

$$ \begin{align*}\mathcal{L} = \prod_{i,t=1}^{I,T} \prod_{k=1}^K \Lambda(\alpha_k - \beta_k \theta_{i,t})^{y_{ikt}} (1-\Lambda(\alpha_k - \beta_k \theta_{i,t})^{1-y_{ikt}}),\end{align*} $$

where $\theta _{i,t}$ is a unit’s ideal point on a latent dimension at a given time-point, $\alpha _k$ and $\beta _k$ are an item’s difficulty and discrimination parameters, and $\Lambda $ is the logistic function. $y_{ikt}$ is an observed binary outcome for a unit $\times $ item $\times $ time-point combination. The latent model is identified by constraining the discrimination parameters to be positive (rotational invariance), and by fixing the mean and standard deviation of the ideal points to 0 and 1 (local and scalar invariance; see Bafumi et al. Reference Bafumi, Gelman, Park and Kaplan2005). We assign standard (half) normal distributions to the vectors $\vec {\alpha }$ and $\vec {\beta }$ of length K, and to the vector $\vec {\theta }_{t=1}$ of length I: $\vec {\alpha } \sim N(0,1)$ , $\vec {\beta } \sim N^+ (0,1)$ , and $\vec {\theta }_{t=1} \sim N(0,1).$

The dynamic nature of the model is ensured by specifying yesterday’s $\theta _{i, t-1}$ as the expectation of today’s $\theta _{i,t}$ , so that:

$$ \begin{align*}\theta_{i,t} = \theta_{i,t-1} + \gamma_{i,t} \sigma \quad \forall i = 1, \ldots, I \quad \& \quad \forall t = 2, \ldots, T,\end{align*} $$

where $\gamma _{i,t}$ are the random effects over time and $\sigma $ is a smoothing hyperparameter that governs how much the past informs the present with a standard half normal prior: $\sigma \sim N^+(0,1)$ .Footnote ⁴

The primary difference in the model comparisons below are the priors used for $\gamma _{i,t}$ .Footnote ⁵ We compare the results from models using three different prior distributions for $\gamma _{i,t}$ : (i) a specification using the standard normal distribution ( $N(0,1)$ ), (ii) one using the Student’s t-distribution (with $\nu = 4$ ; $StudentT(4, 0, 1)$ ), and (iii) our suggested specification with the regularized horseshoe (as detailed in Section 2 above).

4 Results

In this section, we replicate the simulation study of RKF and their re-analysis of Pemstein, Meserve, and Melton Reference Pemstein, Meserve and Melton2010, who estimated the trajectories of countries’ democracy scores—a construct likely to exhibit stasis and change across time.Footnote ⁶ We limit ourselves to a selection of quantities of interest here (focusing on the accuracy and precision of estimates as well as model fit). We present more background and quantities of interest in the SI (Section B focuses on the simulation study, Section C on the application). The results there corroborate the results here.

Figure 2 Simulation results. Differences shown along the vertical axis, relative improvement alongside the median lines. For example, the correlation estimated via the RHS is 1% larger than via the standard normal. Note that the difference for the ELPD is positive although the relative improvement is smaller than 100% because ELPD is measured on a negative scale.

4.1 Simulation Study

RKF simulate data according to the model specification in Section 3. $\theta _{i,t}$ is either $\theta _{i,t-1} + \gamma _{i,t} \sigma $ (and $\gamma _{i,t} \sim N(0,1)$ ) or $\theta _{i,t}$ is a new draw from $N(0,1)$ . This induces breaks in the otherwise more gradual trajectory of $\theta $ over time. RKF vary the extent of smoothing over time (via $\sigma $ ) as well as the probability of a break, set $I = T = 50$ , and $K = 5$ , and generate 25 datasets per condition. This results in a total of 225 datasets.

The main results are shown in Figure 2. We focus on four quantities of interest: the correlation between $\hat {\theta }$ and $\theta $ (using Pearson’s $\rho $ ), the root mean squared error of $\hat {\theta }$ , the size of the standard errors around $\hat {\theta }$ , and the expected log pointwise predictive density (ELPD) based on leave-one-out cross-validation (LOO-CV) as a measure of model fit (Vehtari et al. Reference Vehtari, Gelman and Gabry2017). We set the RHS as the reference distribution, subtract the values of the other two specifications (indicated along the vertical axis), and calculate the median relative improvement of the RHS (shown alongside the median line). Correlation and model fit using the RHS are larger, while RMSE and standard errors are smaller. The relative improvements in RMSE and SE are larger (between 2.3%–7.3%) than in correlation or model fit (between 0.2%–2.3%). The improvements are larger when the probability of a shock is larger and $\sigma $ is smaller (see Figures 8 and 9 in the SI). The RHS outperforms both the standard normal and the Student’s t-distribution in the simulation study.

4.2 Revisiting Pemstein et al. Reference Pemstein, Meserve and Melton2010

RKF estimate a dynamic model of countries’ democracy scores based on the static model of Pemstein et al. Reference Pemstein, Meserve and Melton2010 (via an ordered logit specification). Here, we first focus on model fit via the ELPD (based on LOO-CV). The differences between specifications are large and statistically significant. The specification using the RHS performs better than the one using the t-distribution by 1781 units (SE: 58), and better than the one using the normal distribution by 6023 (SE: 125). Using the RHS results in a better model fit than using either of the other two distributions. Second, we turn to predictive accuracy: how often do predictions equal the observed values of democracy indicators? Here, too, the RHS performs better than the other specifications. Overall, the predictive accuracy using the RHS is 62.9% (95% credible interval: 62.6%–63.2%). It is 61.7% (95% ci: 61.4%–62%) for the t-distribution and 58.5% (95% ci: 58.2%–58.8%) for the standard normal distribution. A breakdown by indicator is provided in Figure 11 in the SI. Using the RHS results in a higher predictive accuracy than using the other two priors.

Figure 3 Trajectories of the Philippines and Afghanistan (1946–2008; mean and 95% ci).

Third, we revisit the two countries discussed by RKF: the Philippines and Afghanistan (Figure 3). The countries exhibit longer periods of stasis (e.g., after the Marcos regime in the Philippines) as well as rapid change (e.g., the Saur Revolution in Afghanistan). Both the RHS and the t-distribution more aptly recover abrupt large changes in latent democracy scores than the normal distribution does. In addition, the RHS appears to have narrower credible intervals than the t-distribution does in some periods of stability (e.g., before and after the Marcos regime).

What if we consider the results of all country-year observations rather than this selection? We turn to this in Figure 4 in which we compare the magnitude of year-over-year changes in countries’ mean democracy scores and the credible intervals of their scores between models. We subtract the values from the standard normal and t-distribution specifications from those based on the RHS. There is little systematic difference in the year-over-year changes across models (panel (a)). In contrast, the credible intervals of the RHS are systematically smaller than those of either other distribution (panel (b)). On average, the intervals of the RHS are 44% the size of those using the standard normal distribution, and 73% of those based on the Student’s t-distribution. If researchers use estimates from similar models in further explanatory models and take measurement uncertainty into account (as suggested by Tai et al. Reference Tai, Hu and Solt2024), then the RHS will produce less variance and more precision to begin with.

Figure 4 Year-over-year changes and credible intervals by model.

5 Conclusion

In this letter, we introduce the regularized horseshoe as a prior to political science, and discuss its theoretical and empirical utility in the domain of dynamic latent variable models where periods of stasis can be interrupted by moments of rapid change. In our replication of RKF, the RHS prior performed better than either the standard normal or the Student’s t-distribution. The use of the RHS leads to more accurate and precise estimates as well as better model fit.

Our substantive focus has been on modeling countries’ democracy scores over time. It is likely that the RHS prior would perform similarly well in other areas where dynamic processes are characterized by a punctuated equilibrium such as, e.g., the trajectories of party positions, public opinion, or budgetary allocations. In addition, due to its sparsity-inducing nature, the RHS could be a candidate for applications where researchers wish to find a balance between a model’s flexibility and rigidity via a prior distribution, such as in the case of differential item functioning (see, e.g., Binding, Koedam, and Steenbergen Reference Binding, Koedam and Steenbergen2024).

Acknowledgments

We thank Marco R. Steenbergen and Artur Pokropek for their insightful feedback, as well as the reviewers and editor.

Competing Interest

None.

Funding Statement

The work of the first author was supported by the Swiss National Science Foundation (Grant No. 212396). The work of the second author was supported by the Polish National Science Centre (Sonata Bis-10: UMO-2020/38/E/HS6/00302).

Data Availability Statement

Replication code for this article is available in Binding and Koc (Reference Binding and Koc2024) at https://doi.org/10.7910/DVN/G2VRQH.

Supplementary Material

For supplementary material accompanying this paper, please visit https://doi.org/10.1017/pan.2024.30.

Footnotes

Edited by: Jeff Gill

1 To the best of our knowledge, the RHS has so far not been discussed or used in political science. It’s predecessor—the horseshoe—was employed by Ratkovic and Tingley (Reference Ratkovic and Tingley2017) and considered by Park and Yamauchi (Reference Park and Yamauchi2023).

2 Here, our discussion focuses on the RHS. In the Supplementary Information (SI), we also discuss and assess the original horseshoe and discrete spike-and-slab priors (George and McCulloch Reference George and McCulloch1993; Mitchell and Beauchamp Reference Mitchell and Beauchamp1988).

3 Both $\nu $ and s could be set to other values. We chose these values so that $\tau $ , $\lambda _i$ , and $c^2$ have the same Student’s t prior (equivalent to a Cauchy distribution), and to thereby allow for large deviations from 0 across all three parameters via their prior. Future research could explore the results’ sensitivity to these choices.

4 RKF assign (half) normal distributions with a standard deviation of 3 to $\alpha $ , $\beta $ , and $\sigma $ , while we assign a standard deviation of 1. This difference does not matter within this comparison.

5 We omit $\sigma $ when using the RHS because $\tau $ performs a similar function in the RHS prior and its inclusion would lead to identification issues.

6 We omit RKF’s replication of Martin and Quinn Reference Martin and Quinn2002 for brevity and because the lack of shared items from one time-period to the next (as Supreme Court Justices are faced with new cases every session) necessitates a series of rigid constraints on parameters.

References

Bafumi, J., Gelman, A., Park, D. K., and Kaplan, N.. 2005. “Practical Issues in Implementing and Understanding Bayesian Ideal Point Estimation.” Political Analysis 13 (2): 171–187.CrossRef Google Scholar

Binding, G., and Koc, P.. 2024. “Replication Data for: Adding Regularized Horseshoes to the Dynamics of Latent Variable Models.” Version DRAFT VERSION. https://doi.org/10.7910/DVN/G2VRQH.CrossRef Google Scholar

Binding, G., Koedam, J., and Steenbergen, M. R.. 2024. “The Comparative Meaning of Political Space: A Comprehensive Modeling Approach.” Political Science Research and Methods 12 (3): 643–51. https://doi.org/10.1017/psrm.2023.16.CrossRef Google Scholar

Carpenter, B., et al. 2017. “Stan: A Probabilistic Programming Language.” Journal of Statistical Software 76 (1). https://doi.org/10.18637/jss.v076.i01.CrossRef Google Scholar

Carvalho, C. M., Polson, N. G., and Scott, J. G.. 2010. “The Horseshoe Estimator for Sparse Signals.” Biometrika 97 (2): 465–480. https://doi.org/10.1093/biomet/asq017.CrossRef Google Scholar

Caughey, D., and Warshaw, C.. 2015. “Dynamic Estimation of Latent Opinion Using a Hierarchical Group-Level IRT Model.” Political Analysis 23 (2): 197–211.CrossRef Google Scholar

George, E. I., and McCulloch, R. E.. 1993. “Variable Selection via Gibbs Sampling.” Journal of the American Statistical Association 88 (423): 881–889.CrossRef Google Scholar

König, T., Marbach, M., and Osnabrügge, M.. 2013. “Estimating Party Positions across Countries and Time—A Dynamic Latent Variable Model for Manifesto Data.” Political Analysis 21 (4): 468–491.CrossRef Google Scholar

Martin, A. D., and Quinn, K. M.. 2002. “Dynamic Ideal Point Estimation via Markov chain Monte Carlo for the US Supreme Court, 1953–1999.” Political Analysis 10 (2): 134–153.CrossRef Google Scholar

Mitchell, T. J., and Beauchamp, J. J.. 1988. “Bayesian Variable Selection in Linear Regression.” Journal of the American Statistical Association 83 (404): 1023–1032.CrossRef Google Scholar

Park, J. H., and Yamauchi, S.. 2023. “Change-Point Detection and Regularization in Time Series Cross-Sectional Data Analysis.” Political Analysis 31 (2): 257–277.CrossRef Google Scholar

Pemstein, D., Meserve, S. A., and Melton, J.. 2010. “Democratic Compromise: A Latent Variable Analysis of Ten Measures of Regime Type.” Political Analysis 18 (4): 426–449.CrossRef Google Scholar

Piironen, J., and Vehtari, A.. 2017. “Sparsity Information and Regularization in the Horseshoe and Other Shrinkage Priors.” Electronic Journal of Statistics 11 (2): 5018–5051. https://doi.org/10.1214/17-EJS1337SI.CrossRef Google Scholar

Polson, N. G., and Scott, J. G.. 2011. “Shrink Globally, Act Locally: Sparse Bayesian Regularization and Prediction.” In Bayesian Statistics 9, 501–538. Oxford: Oxford University Press. https://doi.org/10.1093/acprof:oso/9780199694587.003.0017. With discussions by Bertrand Clark, C. Severinski, Merlise A. Clyde, Robert L. Wolpert, Jim e. Griffin, Philiip J. Brown, Chris Hans, Luis R. Pericchi, Christian P. Robert and Julyan Arbel…CrossRef Google Scholar

Ratkovic, M., and Tingley, D.. 2017. “Sparse Estimation and Uncertainty with Application to Subgroup Analysis.” Political Analysis 25 (1): 1–40.CrossRef Google Scholar

Reuning, K., Kenwick, M. R., and Fariss, C. J.. 2019. “Exploring the Dynamics of Latent Variable Models.” Political Analysis 27 (4): 503–517.CrossRef Google Scholar

Schnakenberg, K. E., and Fariss, C. J.. 2014. “Dynamic Patterns of Human Rights Practices.” Political Science Research and Methods 2 (1): 1–31.CrossRef Google Scholar

Tai, Y. C., Hu, Y., and Solt, F.. 2024. “Democracy, Public Support, and Measurement Uncertainty.” American Political Science Review 118 (1): 512–518. https://doi.org/10.1017/S0003055422000429. https://www.cambridge.org/core/product/identifier/S0003055422000429/type/journal_article.CrossRef Google Scholar

Vehtari, A., Gelman, A., and Gabry, J.. 2017. “Practical Bayesian Model Evaluation Using Leave-One-Out Cross-Validation and WAIC.” Statistics and Computing 27: 1413–1432.CrossRef Google Scholar

Figure 1 Empirical density functions.

Figure 2 Simulation results. Differences shown along the vertical axis, relative improvement alongside the median lines. For example, the correlation estimated via the RHS is 1% larger than via the standard normal. Note that the difference for the ELPD is positive although the relative improvement is smaller than 100% because ELPD is measured on a negative scale.

Figure 3 Trajectories of the Philippines and Afghanistan (1946–2008; mean and 95% ci).

Figure 4 Year-over-year changes and credible intervals by model.

Binding and Koc supplementary material

File 629.9 KB

Article contents

Adding Regularized Horseshoes to the Dynamics of Latent Variable Models

Abstract

Keywords

1 Introduction

2 Motivation

3 Model Specification

4 Results

4.1 Simulation Study

4.2 Revisiting Pemstein et al. Reference Pemstein, Meserve and Melton2010

5 Conclusion

Acknowledgments

Competing Interest

Funding Statement

Data Availability Statement

Supplementary Material

Footnotes

References

Binding and Koc supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests