Published online by Cambridge University Press: 22 May 2009
Evaluation research involves the use of the scientific method to evaluate the impact of public programs. The cross-disciplinary evaluation research literature provides a useful orientation for the examination of activities of international organizations. The primary advantage of this approach is that more dependable cause and effect statements are possible. The major limitations relate to its applicability, vary from case to case, and should not be underestimated. Yet they do not preclude its beneficial use by international organization scholars and policy makers. The smallpox eradication program of the World Health Organization serves as the basis for an illustration of the evaluation research methodology.
1 International Organization 24 (Winter 1970): 160–61Google Scholar.
2 Boruch, Robert F., “Bibliography: Illustrative Randomized Field Experiments For Planning and Evaluation,” mimeo, Evanston, Illinois, Northwestern University, 1974Google Scholar.
3 Cf. Weiss, Carol H., Evaluation Research, Methods of Assessing Effectiveness (Englewood Cliffs, N.J.: Prentice-Hall, 1972): 129–54Google Scholar.
4 Riecken, Henry, Boruch, Robert F., Campbell, Donald T., Caplan, Nathan, Glennan, Thomas K. Jr, Pratt, John W., Rees, Albert and Williams, Walter, Social Experimentation: A Method for Planning Social Intervention (New York: Academic Press, 1974)Google Scholar; Weiss, Evaluation Research; Wholey, Joseph S., Scanlon, John W., Duffy, Hugh G., Fukumoto, James S. and Vogt, Leona M., Federal Evaluation Policy, Analyzing The Effects of Public Programs (Washington, D.C.: The Urban Institute, 1973)Google Scholar; Rivlin, Alice M., Systematic Thinking for Social Action (Washington, D.C.: The Brookings Institution, 1971)Google Scholar; Hayes, Samuel P. jr, Evaluating Development Projects, 2nd Edition (Paris: United Nations Educational, Scientific and Cultural Organization, 1966)Google Scholar; Freeman, Howard E. and Sherwood, Clarence C., Social Research and Social Policy (Englewood Cliffs, New Jersey: Prentice-Hall, 1970)Google Scholar; Bernstein, Ilene N. and Freeman, Howard E., Academic and Entrepreneurial Research, The Consequences of Diversity in Federal Evaluation Studies (New York: Russell Sage Foundation, 1975)Google Scholar; Agency for International Development, Evaluation Handbook, Second Edication (Washington, D.C.: Agency for International Development, 1974)Google Scholar; Caio, Francis G., ed., Readings in Evaluation Research (New York: Russell Sage Foundation, 1971)Google Scholar; Lyons, Gene M., ed., Social Research and Public Policies, The Dartmouth/OECD Conference (Hanover, New Hampshire: University Press of New England, 1975)Google Scholar; Rossi, Peter H. and Williams, Walter, eds., Evaluating Social Programs: Theory, Practice, and Politics (New York: Seminar Press, 1972)Google Scholar; Weiss, Carol H., ed., Evaluating Action Programs, Readings in Social Action and Education (Boston: Allyn and Bacon, 1972)Google Scholar; Struening, Elmer L. and Guttentag, Marcia, eds., Handbook of Evaluation Research, Volume 1 (Beverly Hills, California: Sage Publications, Inc., 1975)Google Scholar; Guttentag, Marcia and Struening, Elmer L., eds., Handbook of Evaluation Research, Volume 2 (Beverly Hills, California: Sage Publications, Inc., 1975)Google Scholar; Leonard, William R., Jenny, Beat Alexander and Nwali, Offia, UN Development Aid, Criteria and Methods of Evaluation (New York: Arno Press, 1971)Google Scholar; and, Suchman, Edward A., Evaluation Research, Principles and Practice in Public Service and Social Action Programs (New York: Russell Sage Foundation, 1967)Google Scholar.
5 Suchman, , Evaluation Research: 31–2Google Scholar.
6 Wholey, et al. , Federal Evaluation Policy: 19Google Scholar.
7 Freeman, Bernstein and, Academic and Entrepreneurial Research: 1Google Scholar.
8 The work of Donald T. Campbell and associates provides the methodological orientation for the evaluation research literature. The basic works are Campbell, Donald T. and Stanley, Julian C., Experimental and Quasi-Experimental Designs for Research (Chicago: Rand McNally, 1963)Google Scholar; Campbell, Donald T., “Reforms as Experiments,” American Psychologist 24 (04 1969): 409–29CrossRefGoogle Scholar; Campbell, Donald T. and Ross, H. Laurence, “The Connecticut Crackdown on Speeding, Time Series Data in Quasi-Experimental Analysis,” Law and Society Review 3 (08 1968): 33–53CrossRefGoogle Scholar; Ross, H. Laurence, Campbell, Donald T. and Glass, Gene V., “Determining the Social Effects of a Legal Reform, The British ‘Breathalyser’ Crackdown of 1967,” American Behavioral Scientist 13 (03-04 1970): 493–509CrossRefGoogle Scholar; Riecken et al., Social Experimentation; Campbell, Donald T., “Assessing the Impact of Planned Social Change,” in Lyons, , ed., Social Research and Public Policies: 3–45Google Scholar; and Cook, Thomas D. and Campbell, Donald T., ”The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings,” in Dunnette, M. D., ed., Handbook of Industrial and Organizational Psychology (Chicago: Rand McNally, 1975): 223–326Google Scholar.
9 The types of validity, rival explanations, and research designs discussed here were originally presented in Cook and Campbell, “The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings.” The terminology is that of Cook and Campbell.
10 For elaboration on the types of validity and the rival explanations see Cook, and Campbell, , “The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings”: 224–6Google Scholar.
11 Cook, and Campbell, , “The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings”: 245Google Scholar.
12 Basic to the evaluation research approach is the idea that it is impossible to establish that a social action program actually caused an observed impact. A causal explanation is retained if three conditions are present: (1) the cause precedes the effect in time; (2) treatments covary with effects (if the cause and effect are not related then one could not cause the other); and (3) there are no alternative explanations that are more plausible. The impact hypothesis deals with the first and second points while the rival explanations deal with all three points.
13 Campbell, , “Reforms as Experiments”: 411Google Scholar.
14 Ross, , Campbell, and Glass, , “Determining the Social Effects of a Legal Reform”: 495Google Scholar.
15 Cook, and Campbell, , “The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings”: 229Google Scholar.
16 Cook and Campbell, ibid: 238–5.
17 Cook and Campbell, ibid: 234–48.
18 Campbell and Stanley, Experimental and Quasi-Experimental Designs for Research; and Cook, and Campbell, , “The Design and Conduct of Quasi-Experiments and True Experiments in Field Settings”: 245–98Google Scholar.
19 These design types differ primarily in regard to their control of explanations in the internal validity category. Experimental designs equate treatment and control groups by randomization, thereby isolating the effect of the treatment and helping to rule out numerous rival explanations in the internal validity category. When quasi-experimental designs are used the rival explanations must be considered one at a time. The use of time-series data and non-equivalent control groups is frequent in the quasi-experimental designs. These designs are suggested for use when the more powerful experimental designs are not feasible. Pre-experimental designs do not use randomization and make insufficient use of the quasi-experimental controls. They are designs of last resort.
20 Both this design and the interrupted time-series design are most powerful when the impact is abrupt and can be noted as a pretest-posttest (04 – O5) change because plausible rival hypotheses are more easily controlled for a short period. Both designs require that data be available for an extended period prior to the initiation of the program. This means that the data probably were collected for some purpose other than to evaluate the program. These designs would be most useful for evaluating macro programs that cover environments where a social indicator system has been in existence for some time.
21 For an introduction to these matters and a discussion of their relevance for the study of international organizations see Hoole, Francis W., “The Behavioral Science Orientation to the Study of International Administration,” in Jordan, Robert S., ed., Multinational Cooperation, Economic, Social, and Scientific Development (New York: Oxford University Press, 1972): 327–64Google Scholar.
22 For background on the WHO smallpox eradication program see WHO Expert Committee on Smallpox, First Report, World Health Organization Technical Report Series No. 283 (Geneva: World Health Organization, 1964)Google Scholar; Smallpox Eradication, Report of a WHO Scientific Group, World Health Organization Technical Report Series No. 393 (Geneva: World Health Organization, 1968)Google Scholar; WHO Expert Committee on Smallpox Eradication, Second Report, World Health Organization Technical Report No. 493 (Geneva: World Health Organization, 1972)Google Scholar; Special issue entitled “Smallpox, Point of No Return,” World Health (February-March 1975); ”Smallpox Eradication,” in The Second Ten Years of the World Health' Organization, 1958–67 (Geneva: World Health Organization, 1968): 105–11Google Scholar; ”Smallpox Eradication Programme,” Annex 15, Nineteenth World Health Assembly, Part I, World Health Organization Official Records No. 151 (Geneva: World Health Organization, 1966): 106–21Google Scholar; ”Smallpox,” Handbook of Resolutions and Decisions of the World Health Assembly and Executive Board, Volume I, 1948–72 (Geneva: World Health Organization, 1973): 89–97Google Scholar.
23 WHO Expert Committee on Smallpox Eradication, Second Report: 9.
24 WHO will declare the disease eradicated only after a two year period during which no cases are located. What has the WHO program cost so far? It has been estimated that the countries with smallpox eradication programs have spent approximately two-hundred million dollars and that international aid has amounted to another fifty million dollars. Of the latter amount roughly twenty-two million dollars came from the WHO regular budget, fourteen million dollars came from the US, eight million dollars came from the USSR, and the remainder (six million dollars) was contributed by other countries. The most impressive program in the history of public health has cost approximately two-hundred and fifty million dollars. Surely the smallpox eradication program of the World Health Organization has been a bargain. It has been estimated that one hundred and fifty-one million dollars were spent in 1968 in protecting individuals from smallpox in the United States, even though there had not been a smallpox case in the US for several years. In 1971 the WHO smallpox eradication program succeeded in eliminating smallpox from the Western hemisphere and compulsory smallpox vaccination was abolished in the United States. Most of the money being spent on the disease was saved. Because of the WHO smallpox eradication program the savings in the US between 1971 and 1975 amounted to more than all of the contributions made by the United States government to the World Health Organization since the creation of the agency in 1948. These cost estimates were taken from Tomiche, F.J., “Fruits of Victory,” World Health (02-03 1975): 26–28Google Scholar.
25 I am especially grateful to Mr. Jacques Copland for his assistance.
26 Gujarti, Damodar, “Use of Dummy Variables in Testing Equality Between Sets of Coefficients in Two Linear Regressions: A Note,” The American Statistician 24, 1 (1970): 50–52Google Scholar;and Gujarti, Damodar, “Use of Dummy Variables in Testing for Equality Between Sets of Coefficients in Linear Regressions: A Generalization,” The American Statistician24, 5 (1970): 18–22Google Scholar.
27 For a general treatment of regression analysis see Johnston, J., Econometric Methods, 2nd Edition (Tokyo: McGraw-Hill Kogakusha, Ltd., 1972)Google Scholar.
28 Unfortunately the effect of the violation of the stationarity assumption is not well understood.
29 Durbin, J. and Watson, G. S., “Testing for Serial Correlation in Least Squares Regression,” Part I, Biometrika 37 (1950): 409–28Google ScholarPubMed; and Durbin, J. and Watson, G. S., “Testing for Serial Correlation in Least Squares Regression,” Part II, Biometrika 38 (1951): 159–78CrossRefGoogle ScholarPubMed.
30 Theil, H. and Nagar, A. G., “Testing the Independence of Regression Disturbances,” Journal of the American Statistical Association 56 (1961): 793–806CrossRefGoogle Scholar.
31 In future analyses consideration will be given to trying to adapt Box-Jenkins ARIMA models, which have the advantage of handling nonlinearities and higher order autoregressive schemes for the disturbances, for analysis of the smallpox data. For information on autoregressive integrated moving average models see Box, G.E.P. and Jenkins, G.M., Time Series Analysis: Forecasting and Control (San Francisco: Holden-Day, 1970)Google Scholar; Naylor, Thomas H., Seaks, Terry G. and Wichern, D. W., “Box-Jenkins Methods: An Alternative to Econometric Models,” International Statistics Review 40 (1972): 123–37CrossRefGoogle Scholar; Box, G.E.P. and Tiao, G.C., “A Change in level of nonstationary time series,” Biometrika 52 (1965): 181–92CrossRefGoogle Scholar; and Glass, G.V., Tiao, G.C. and Maguire, T.O., “The 1900 revision of German divorce laws: Analysis of data as a time-series quasi-experiment,” Law and Society Review 6 (1971): 539–62CrossRefGoogle Scholar.
32 All statistical analysis reported in this paper was done on the CDC 6600 computer at Indiana University. A revised version of the BMDO2R program was prepared for this analysis by David H. Handley. I am most grateful to him.
33 What would have happened had the WHO intensified program not been initiated? It is unlikely that the trend would have gone downward any faster without the WHO program. That circumstance would mean that the smallpox eradication program actually slowed down progress and did more harm than good. It is more likely that the number of countries reporting cases would have leveled off at around forty-four and stayed near that level for a period. This scenario would reflect the situation where all countries that were going to bring smallpox under control for a period had done so. Another likely alternative is that the trend would have continued to move downward at approximately the same rate. Under this assumption smallpox would have been eliminated in countries as public health infrastructures were developed, in the course of normal events, and the general level of health improved. It is, of course, impossible to say definitely what would have happened had the WHO intensified program not been undertaken.
34 Chow, G.C., “Tests of Equality Between Sets of Coefficients in Two Linear Regressions,” Econometrica (1960): 591–605Google Scholar.
35 Campbell, , “Reforms as Experiments”: 411Google Scholar.
36 Ross, , Campbell, and Glass, , “Determining the Social Effects of a Legal Reform”: 495Google Scholar.
37 See the citations in footnote 31. Cook and Campbell note that a sample of fifty cases should be available for use of these techniques. See Cook, and Campbell, , “The Design and Conduct of Quasi-Experiments in Field Settings”: 275Google Scholar.
38 I know of only one published study in the international organization field that has used evaluation research methodology. It was focused on international integration. See Caporaso, James A. and Pelowski, Alan L., “Economic and Political Integration in Europe: A Time-Series Quasi-Experimental Analysis,” American Political Science Review 65 (1971): 418–33CrossRefGoogle Scholar.
39 For an excellent general discussion of the potential ethical problems involved in evaluation research see Riecken, et al. , Social Experimentation: 245–69Google Scholar.
40 There is an additional methodological problem which may be of special salience for some policy makers: how can the impact of a particular agency activity be evaluated when a program is a joint one? The activity by the agency must be isolated and an experimental or quasi-experimental design should be set up within the general program research design if it is imperative to know the separate impact.
41 The cost of evaluation research will not be discussed in this section but may be seen as an organization problem by some readers. Not all programs require evaluation and all programs being evaluated do not require the same level of funding. Based upon the experience in evaluating domestic programs in the United States it is suggested that between one and two percent of the budget for program activities be allocated for systematic evaluation research. This percentage can be adjusted as experience is gained. For additional information on the US experience see Wholey, et al. , Federal Evaluation Policy: 77–82Google Scholar, and Bernstein, and Freeman, , Academic and Entrepreneurial Research: 140Google Scholar.
42 Campbell, , “Reforms as Experiments”: 410Google Scholar.
43 Campbell, , “Assessing the Impact”: 35Google Scholar.
44 Cf. Weiss, , Evaluation Research: 104–7Google Scholar.
45 Weiss, , Evaluation Research: 2Google Scholar.