Andrews and others (1971) have correctly noted that the product-moment correlation coefficient cannot be used on cumulative data. I wish to add that the value of r between two cumulative series is not independent of the scale used for measurement, and that it is in fact possible to obtain nearly any desired value of r by simply adjusting the mean values of the two series before cumulating them.
The product-moment correlation coefficient is designed to deal with data which follow a Normal distribution. Observations of such data may be expressed as
where x is some mean value and e x is N(ο, σ). When a series of observations of the Normal variate of Equation (1) is expressed in cumulative form, the nth observation becomes
The final term in Equation (2) introduces a serial correlation which destroys the independence of the observations and converts the series of random Normal observations into a one-dimensional random walk (Mitchell and others, 1966, p. 6). This effect is illustrated in Figure 1; 500 random Normal observations were generated on a CDC 6400 computer using the algorithm of Reference NaylorNaylor and others (1966, p. 95), and these are plotted as a raw series (Fig. 1a) and as a cumulated series (Fig. 1b). It is clear that the cumulated series is far from random. The correlation between two random series is substantially altered by the transformation from raw to cumulated series, and this is shown in Table I. Ten pairs of random Normal series, e x and e y , were generated (N = 500) and the correlations between them were calculated for both raw (Equation (1)) and cumulated (Equation (2)) series, with The correlations between the cumulated series give no hint of the basic lack of relationship between the raw observations.
Another potential source of error is that if the mean value of a series is different from zero, then the first term on the right side of Equation (2) will introduce a linear trend into the set of cumulative observations. The magnitude of the trend depends upon the absolute value of the mean and upon the length of the series, while the direction of the trend depends on the sign of the mean.
When the means of both series in a correlation analysis are different from zero, the introduced linear trends tend to dominate the relationship between the two sets of cumulative observations. A simulation model was designed to study the behavior of the correlation coefficient between two cumulated random Normal series, x and y, when different combinations of x and ȳ were added to the series before cumulation. Two 500-observation series of random Normal variates corresponding to the e x (or e y ) terms of Equation (1) were generated. Values of x and ȳ were varied from −0.2 to + 0.2 by steps of 0.04. For each possible combination of x and ȳ the two series were converted to the cumulative form according to Equation (2), and the correlation between them was calculated.
The results for one run of this model are shown in Table II. When x and ȳ are of the same sign, the two series are strongly positively correlated, but when they are of opposite sign, strong negative correlations result. By choosing different mean values for two unrelated series of random observations and then expressing those observations in cumulative form, it is thus possible to obtain almost any desired value of r.
Indeed, it is not necessary that the two series be unrelated in order to be able to select r at will. The two simple examples in Table III and Figure 2 show that it is quite easy to completely reverse the sense of a relationship by using cumulative series instead of raw data.
The high correlations between cumulative series reported by Andrews and others (1971) result from the fact that they used random numbers with a mean value of 50 for both sets of observations. By choosing different mean values for their initial series, they could have obtained any value they wanted.
Another quirk of the correlation coefficient between cumulated series is that it depends to a certain extent on the order in which the observations are cumulated. Only the final point in the cumulated series has a fixed value for a given set of points; the other points may assume different values depending on which point is chosen as the initial one and the sequence of the points which follow. When non-cumulated series are correlated, the order in which the pairs of observations are taken does not affect the correlation coefficient; in the case of cumulated series, however, variations in the magnitude of the coefficient do occur when different orders of accumulation are followed.
In summary, the transformation of a set of observations to cumulative form destroys the independence of the observations and makes the correlation coefficient strongly dependent on the scale used for measurement and on the length of the series. Correlating cumulated series is thus a procedure whose use should be restricted to special circumstances or completely eliminated.
Acknowledgements
A portion of this work was supported by NSF Grant GA-4128 to Valmore C. LaMarche. Computer time was supplied by the University of Arizona Computer Center. I thank John Sims for helpful criticism of the manuscript.