# Example Analysis

One of the chief advantages of the UDS over existing democracy scales is that they accompany each democracy rating with a quantitative estimate of measurement uncertainty. The probability model that drives the UDS does not simply produce point estimates of democracy levels across space and time, but generates full posterior distributions for each rating. Unfortunately, relatively few social scientists are familiar with how to use samples from these posterior distributions in subsequent analyses. In what follows, we walk users through a brief tutorial that demonstrates just such an analysis (download this tutorial, including data and .do files). This example uses the original UDS release from 2010.

## Introduction

In a recent article Gretchen Casper and Claudia Tufis demonstrate that existing democracy measures, although highly correlated, produce different results from one another when used to test a simple model of democratization. We're going to replicate part of Casper and Tufis' (2003) analysis and extend it by refitting the democratization model, using the UDS in place of a traditional democracy measure. In the process, we'll show you how to work with the UDS posterior samples on this site and how to take measurement error in the UDS into account when using them in subsequent analyses. We're going to perform the entire analysis in Stata 10, but the process is similar using other statistical software.

## Replication

Download Casper and Tufis' (2003) replication dataset (local copy) from Casper's website. If you wish to replicate their entire analysis, you can also download and run the stata dofile (local copy). For this tutorial, we're just going to replicate columns 5, 6, and 7 of Table 1 on page 5. Casper and Tufis' democratization model uses a variety of lagged economic indicators, education measures, and political institution variables to predict a nation's democracy level in a given year. They use a straightforward statistical approach, running a linear regression with panel corrected standard errors. Table 1 duplicates columns 5, 6, and 7 in Casper and Tufis' table, and displays the results of fitting this model using Polity IV, Vanhanen's Polyarchy 1.2 dataset, and Freedom House's democracy measure as the dependent variable, for the 1975-1992 period. Bold numbers indicate that coefficients are statistically significant at the 0.05 level.

Polity | Vanhanen | Freedom House | |
---|---|---|---|

GDP pc, logged | 3.372 | 6.549 | 2.236 |

(0.280) | (0.492) | (0.132) | |

Real GDP pc growth | -0.023 | -0.046 | -0.010 |

(0.009) | (0.014) | (0.004) | |

Openness | -0.009 | -0.012 | -0.004 |

(0.004) | (0.008) | (0.002) | |

Inflation | -0.002 | 0.016 | -0.003 |

(0.010) | (0.013) | (0.003) | |

Primary Education | -0.102 | -0.236 | -0.038 |

(0.047) | (0.088) | (0.018) | |

Secondary Education | 0.055 | 0.159 | 0.030 |

(0.099) | (0.171) | (0.042) | |

Presidential | 0.508 | 0.011 | 0.607 |

(0.416) | (0.621) | (0.191) | |

Parliamentary | 2.059 | 2.924 | 0.768 |

(0.592) | (0.880) | (0.230) | |

Party Fractionalization | 3.598 | 7.424 | 1.864 |

(0.869) | (1.254) | (0.465) | |

Constant | -16.291 | -40.612 | -9.713 |

(1.704) | (2.899) | (0.882) |

Performing this analysis in Stata is straightforward. After loading the replication dataset, limit the data to the 1975-1992 period. Then run the panel corrected linear regression, confining the analysis to observations where all the three measures provide scores, using the xtpcse command, after prepping the data with the tsset command. The code to perform these operations is displayed below. Note that the independent variables are all lagged one year, using the L1 operator. If you're doing everything right your output will match the results in Table 1.

*** Get rid of pre-75 observations drop if year < 1975 *** Prep for panel analysis tsset id year, yearly *** Column 5: dv = polity xtpcse polityiv L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1, pairwise c(a) *** Column 6: dv = vanhanen xtpcse poly12 L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1, pairwise c(a) *** Column 7: dv = freedom house xtpcse fhscore L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1, pairwise c(a)

## Extension

Now that we've replicated part of Casper and Tufis' analysis and walked through the basic Stata commands used to fit the democratization model, we're ready to merge the UDS into the dataset and refit the model, taking measurement error into account. Both the UDS and Casper and Tufis' replication dataset use COW country codes, making it easy to merge the data. First, clear out your stata environment, make sure you've allocated a reasonable amount of memory, and load the UDS 1000-draw sample.

*** Load the UDS clear set mem 500M insheet using "uds_1000.csv"

Next, merge in the Casper and Tufis dataset, eliminate observations before 1975 or after 1992 and drop cases for which all democracy measures do not provide scores.

*** Merge datasets gen id = cowcode sort id year merge id year using "PA_DTA_file.dta" *** Drop unused observations drop if year < 1975 | year > 1992 duplicates drop id year, force /*corrects problem in the initial UDS release*/

There are (at least) two different ways we can incorporate the UDS into the analysis at this point. One option is to treat the UDS as simple point estimates, just as we treat Freedom House, Polity, and the Vanhanen scores. This approach is straightforward and easy. It also has potential advantages over using any single-rater democracy score in that it represents a compromise between a wide array of measures from experts across the field. The first column of Table 2 displays the results of this approach, which treat the mean of the UDS' posterior densities as the dependent variable in the Casper and Tufis model. This approach, which is demonstrated in the code listing following this paragraph, generates results that differ slightly from any of the columns in Table 1. Nonetheless, they provide few surprises; coefficient directions are consistent with Table 1 and every statistically significant coefficient in the UDS model is significant in at least one of the models in Table 1.

*** Prep for panel analysis tsset id year, yearly *** Run the democratization model with UDS point estimates xtpcse mean L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1, pairwise c(a)

Treating the UDS as point estimates is simple, but potentially misleading. A major contribution of the UDS is that they, unlike most other available measures, provide the analyst with quantitative estimates of uncertainty. A Unified Democracy Score for a given country is not represented simply by a single number but by a posterior density. The UDS do not purport to provide infallible democracy judgments but rather acknowledge the impact of measurement error, providing ratings in terms of probability distributions.

UDS Mean | UD 1000 Sample | |
---|---|---|

GDP pc, logged | 0.511 | 0.416 |

(0.034) | (0.041) | |

Real GDP pc growth | -0.004 | -0.003 |

(0.001) | (0.002) | |

Openness | -0.001 | -0.001 |

(0.000) | (0.001) | |

Inflation | 0.000 | 0.001 |

(0.001) | (0.001) | |

Primary Education | -0.020 | -0.026 |

(0.005) | (0.005) | |

Secondary Education | 0.005 | 0.012 |

(0.011) | (0.013) | |

Presidential | 0.123 | 0.165 |

(0.041) | (0.053) | |

Parliamentary | 0.291 | 0.448 |

(0.062) | (0.069) | |

Party Fractionalization | 0.508 | 0.989 |

(0.103) | (0.132) | |

Constant | -3.898 | -3.333 |

(0.217) | (0.263) |

The current context highlights the importance of measurement confidence. While a point-estimate approach to incorporating the UDS into the current analysis lends support to the importance of both trade openness and presidentialism in predicting democracy level, the fact that only one out of the three single-rater measures support these claims naturally lead one to question our confidence in these results. The probability distributions representing the UDS take such factors as rater reliability and agreement into account and provide us with a way to propagate our estimates of measurement error into the inferences we wish to draw from the democratization model (see our article for a full description of the UDS and their underlying probability model).

We can propagate uncertainty in the UDS to the democratization model using an iterative Monte Carlo approach. At each iteration we:

- Sample from the posterior distribution of the UDS.
- Fit the Casper and Tufis model, using the UDS posterior draw as the dependent variable, and extract the coefficient and panel-corrected variance-covariance matrix from the fitted model.
- Draw and save a single vector from the multivariate normal density with mean equal to the fitted model coefficients and variance-covariance matrix equal to the fitted model's variance-covariance matrix.

This procedure, which is demonstrated in the code listing
below, yields a sample from the marginal posterior density of
the Casper and Tufis model coefficients, treating both the
model coefficients and the UDS as random variables, subject to
various assumptions about the conditional independence of the
UDS and the model parameters (For a more thorough discussion
of the approach described here look up the "method of
composition" in a good reference on statistical
simulation, such as Martin A. Tanner. 1993.
*Tools for Statistical Inference: Methods
for the Exploration of Posterior Distributions and Likelihood
Functions*. Second ed. New York: Springer-Verlag. pp.
30.)

*** Prep for the monte carlo set more off set matsize 1000 *** Note that the matsize must be at least as large as the larger *** dimension in your posterior matrix, in this case 1000 rows. *** While this is possible in State SE and MP, Intercooled Stata *** puts an upper limit of 800 on matsize. If you are using *** Stata IC, set matsize to n <= 800 and change the loop below to *** iterate from 1 to n, thus using only the first n draws *** of the UDS posterior sample. *** Run the monte carlo forvalues i = 1/1000 { *** Print out an iteration number display `i' *** Fit the model, using the ith draw from the UDS posterior quietly xtpcse z`i' L1.pcaplog L1.rgdppcgr L1.open L1.cpi L1.prime L1.second L1.presiden L1.parliamn L1.bksfrac if dv1==1 & dv2==1 & dv3==1, pairwise c(a) *** Extract the coefficients and variance-covariance matrix matrix b = e(b) matrix V = e(V) local blength = colsof(b) *** Preserve the dataset, take a single multivariate normal draw from the *** posterior distribution of the coefficients, and restore the dataset. *** We use the capture command to catch possible errors in drawnorm *** and drop these iterations gracefully. preserve capture quietly drawnorm b1-b`blength', double n(1) means(b) cov(V) clear if _rc == 0 { mkmat b1-b`blength', matrix(bsample) matrix posterior = nullmat(posterior) \ bsample } else { display "Error drawing sample...iteration dropped" } restore }

Upon finishing the Monte Carlo procedure, we are left with a sample from the posterior density of the democratization model's coefficients. This sample is like any other generated from a Bayesian simulation approach and we can easily summarize it. For example, the means of the coefficient posteriors are reasonable point estimates of the impact of each independent variable on democracy level. Furthermore, we can construct credible intervals--the Bayesian version of confidence intervals--around these point estimates simply by calculating various percentiles of the posterior sample. The code below shows how to calculate the means, standard deviations, and 2.5 and 97.5 percentiles of the coefficient posteriors, forming point estimates and 95 per cent credible (confidence) intervals.

*** Get posterior ready to work with svmat posterior *** Calculate means and standard deviations tabstat posterior*, stat(mean sd) *** Find the bounds of the 95 percent credible interval centile posterior*, centile(2.5, 97.5)

The second column of Table 2 displays the results of the Monte Carlo approach to estimating the democratization model, providing posterior means and, in parentheses, standard deviations (note that, because these values are generated by simulation, they will vary slightly from run to run). Estimates with 95 per cent credible intervals that do not cover zero--coefficients that are statistically significant at the 5 per cent level--are highlighted in bold. Taking measurement error into account makes a difference in the inferences we can draw from the democratization model. Furthermore, the impact of measurement error can be unpredictable. Real GDP growth, a variable with a statistically significant effect in all four point-estimate-based specifications, drops out in the Monte Carlo analysis. On the other hand, presidentialism and trade openness, which are both only statistically significant in one of the original models and the UDS Mean model, withstand propagating measurement error into the democratization model.

## Conclusion

Latent constructs like democracy are measured with error but typical social science analyses treat democracy scores as if they were known with certainty. The preceding tutorial demonstrates how to fit a model using the UDS, using a Monte Carlo procedure to propagate measurement error in the democracy scores into the final estimates. This approach is extremely flexible and can be applied to virtually any statistical model you might fit with a traditional, purely point-estimate-based, democracy measure. Furthermore, while the UDS are the dependent variable in the democratization model examined here, the scores can enter the analysis on either side of the equation with no changes to the Monte Carlo procedure.