Hostname: page-component-745bb68f8f-hvd4g Total loading time: 0 Render date: 2025-01-07T21:17:24.299Z Has data issue: false hasContentIssue false

On the Theory of a Set of Tests which differ Only in Length

Published online by Cambridge University Press:  01 January 2025

Walter Kristof*
Affiliation:
Educational Testing Service

Abstract

This paper presents a contribution to the sampling theory of a set of homogeneous tests which differ only in length, test length being regarded as an essential test parameter. Observed variance-covariance matrices of such measurements are taken to follow a Wishart distribution. The familiar true score-and-error concept of classical test theory is employed. Upon formulation of the basic model it is shown that in a combination of such tests forming a “total” test, the singal-to-noise ratio of the components is additive and that the inverse of the population variance-covariance matrix of the component measures has all of its off-diagonal elements equal, regardless of distributional assumptions. This fact facilitates the subsequent derivation of a statistical sampling theory, there being at most m + 1 free parameters when m is the number of component tests. In developing the theory, the cases of known and unknown test lengths are treated separately. For both cases maximum-likelihood estimators of the relevant parameters are derived. It is argued that the resulting formulas will remain reasonable even if the distributional assumptions are too narrow. Under these assumptions, however, maximum-likelihood ratio tests of the validity of the model and of hypotheses concerning reliability and standard error of measurement of the total test are given. It is shown in each case that the maximum-likelihood equations possess precisely one acceptable solution under rather natural conditions. Application of the methods can be effected without the use of a computer. Two numerical examples are appended by way of illustration.

Type
Original Paper
Copyright
Copyright © 1971 The Psychometric Society

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

*

This research was supported in part by The National Institute of Child Health and Human Development, under Research Grant 1 PO1 HDO1762.

References

Anderson, T. W. Introduction to multivariate statistical analysis, 1958, New York: WileyGoogle Scholar
Angoff, W. H. Test reliability and effective test length. Psychometrika, 1953, 18, 114CrossRefGoogle Scholar
Cronbach, L. J. Coefficient alpha and the internal structure of tests. Psychometrika, 1951, 16, 297334CrossRefGoogle Scholar
Cronbach, L. J. & Gleser, Goldine C. The signal/noise ratio in the comparison of reliability coefficients. Educational and Psychological Measurement, 1964, 24, 467480CrossRefGoogle Scholar
Feldt, L. S. The approximate sampling distribution of Kuder-Richardson reliability coefficient twenty. Psychometrika, 1965, 30, 357370CrossRefGoogle ScholarPubMed
Feldt, L. S. A test of the hypothesis that Cronbach's alpha or Kuder-Richardson coefficient twenty is the same for two tests. Psychometrika, 1969, 34, 363373CrossRefGoogle Scholar
Gulliksen, H. Theory of mental tests, 1950, New York: WileyCrossRefGoogle Scholar
Jöreskog, K. G. Statistical models for congeneric test scores. Proceedings, 76th Annual Convention, APA, 1968, 213214.Google Scholar
Jöreskog, K. G. Statistical analysis of sets of congeneric tests. Research Bulletin 69-97, 1969, Princeton, N. J.: Educational Testing ServiceGoogle Scholar
Kristof, W. Statistical inferences about the error variance. Psychometrika, 1963, 28, 129143 (a)CrossRefGoogle Scholar
Kristof, W. The statistical theory of stepped-up reliability coefficients when a test has been divided into several equivalent parts. Psychometrika, 1963, 28, 221238 (b)CrossRefGoogle Scholar
Kristof, W. Testing differences between reliability coefficients. British Journal of Statistical Psychology, 1964, 17, 105111CrossRefGoogle Scholar
Kristof, W. Estimation of true score and error variance for tests under various equivalence assumptions. Psychometrika, 1969, 34, 489507 (a)CrossRefGoogle Scholar
Kristof, W. Statistical notes on reliability estimation. RB-69-25. Princeton, N. J.: Educational Testing Service, 1969 (b)Google Scholar
Kristof, W. On the sampling theory of reliability estimation. Journal of Mathematical Psychology, 1970, 7, 371377CrossRefGoogle Scholar
Lord, F. M. & Novick, M. R. Statistical theories of mental test scores, 1968, Reading, Mass.: Addison-WesleyGoogle Scholar
Novick, M. R. and Lewis, C. Coefficient alpha and the reliability of composite measurements. Psychometrika, 1967, 32, 113CrossRefGoogle ScholarPubMed
Payne, W. H. & Anderson, D. E. Significance levels for the Kuder-Richardson twenty: An automated sampling experiment approach. Educational and Psychological Measurement, 1968, 28, 2339CrossRefGoogle Scholar
Woodbury, M. A. On the standard length of a test. Psychometrika, 1951, 16, 103106CrossRefGoogle Scholar
Woodbury, M. A. & Lord, F. M. The most reliable composite with a specificed true score. The British Journal of Statistical Psychology, 1956, 9, 2128CrossRefGoogle Scholar
Woodbury, M. A. & Novick, M. R. Maximizing the validity of a test battery as a function of relative test lengths for fixed total testing time. Journal of Mathematical Psychology, 1968, 5, 242259CrossRefGoogle Scholar