Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-01-11T20:29:01.367Z Has data issue: false hasContentIssue false

A Note on the Evaluation of Error and Transformation in Data Analysis

Published online by Cambridge University Press:  02 September 2013

Bruce M. Russett
Affiliation:
Yale University

Abstract

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Communications
Copyright
Copyright © American Political Science Association 1965

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1 Russett, Bruce M. and Alker, Hayward R. Jr., Deutsch, Karl W., and Lasswell, Harold D., World Handbook of Political and Social Indicators (New Haven, Conn., Yale University Press, 1964)Google Scholar, reviewed by Arthur S. Banks in this REVIEW, Vol. 59 (March, 1965), pp. 144–146.

2 Two cases asserted to be in error can be shown to be correct, though, as may occur elsewhere, the reasons (comparability adjustments we made in the figures published in our sources) may not be obvious. Furthermore, two kinds of error should be distinguished. One is erroneous data, the other is erroneous footnoting of correct data. For example, two instances of the latter occurred when superscript references to sources were omitted and the reference thus seemed to be to the general source listed for the table. The data, correct, are to be found in others of the works given at the foot of page 53. An erratum sheet is available on request from the Yale Political Data Program, 89 Trumbull Street, New Haven, Connecticut 06520.

3 Highly skewed distributions should of course be subjected to a logarithmic or other transformation before correlation coefficients are computed. For this computation the data were transformed precisely as were those in the Handbook for the analysis given in Part B. The correlations reported here and below will be only those statistically significant at the .01 level with a one-tailed test. It would be misleading to report all correlations, however trivial, because with very low correlations small changes can produce substantial variation in the r (for example from .20 to .30) without affecting what one may be most interested in, the r 2, by nearly as much (i.e., from .04 to .09). Such low correlations are likely to occur by chance in any case. For a discussion of the relevance of “statistical significance” in this situation, however, see the Handbook, p. 263.

4 The review also emphasizes the inappropriateness of applying a logarithmic transformation to a left-skewed distribution and perhaps leaves the impression that we erroneously did so. This is not the case—there are only two left-skewed distributions in the book, and neither was transformed.

5 One measure of skewness is that suggested in the Handbook, where skewness =3(-Md/σ and = the mean, Md = the median, and σ = the standard deviation. By this measure the degree of skewness in the four distributions ranges from .18 to .79. For those distributions in the Handbook that we did transform the mean value for skewness was 1.36.

6 Except for Tables 9 and 51 the variables employed in the preceding analysis were included in Banks, Arthur S. and Textor, Robert B., A Cross-Polity Survey (Cambridge, Mass., M.I.T. Press, 1963)Google Scholar, as dichotomized variables. I substituted table 40, which has a counter-part in the Survey, in the following analysis. In my review of Banks, and Textor, [“Strategies for Comparing Nations,” Journal of Conflict Resolution, Vol. 9 (06 1964), pp. 166–70]Google Scholar I suggested some regrettable consequences of the decision to dichotomize, but there are also weighty arguments in its support. My own “The Calculus of Deterrence,” ibid., Vol. 8 (June 1963), pp. 97–109, looked for association between dichotomized variables.

7 Though it is not a common practice to compute r for dichotomized data it is permissible so long as the n for either side of the dichotomy is not seriously disproportionte and so long as one is not concerned with significance tests.

8 For a useful piece by a political scientist, however, see the forthcoming article by Rudoph Rummel, “Dimensions of Error in Cross-National Data.”

Submit a response

Comments

No Comments have been published for this article.