Psychometrics and Psychological Assessment

doi:10.1017/9781108235433.002

2 - Psychometrics and Psychological Assessment

from Part I - General Issues in Clinical Assessment and Diagnosis

Published online by Cambridge University Press: 06 December 2019

John Hunsley and

Teresa Allan

Edited by

Martin Sellbom and

Julie A. Suhr

Show author details

Martin Sellbom: Affiliation:
University of Otago, New Zealand
Julie A. Suhr: Affiliation:
Ohio University

Book contents

Get access

Summary

In this chapter, we address the key psychometric concepts of standardization, reliability, validity, norms, and utility. In doing so, we focus primarily on classical test theory (CTT) – the psychometric framework most commonly used in the clinical assessment literature – which disaggregates a person’s observed score into true score and error components. Given its growing use with psychological instruments, we also present basic information on aspects of item response theory (IRT). In contrast to CTT, IRT assumes that some test items are more relevant than other items for evaluating a person’s true score and that the extent to which an item accurately measures a person’s ability can differ across ability levels. After presenting the central aspects of these two frameworks, we conclude the chapter with a discussion of the need to consider cultural/diversity issues in the development, validation, and use of psychological instruments.

Keywords

standardization reliability validity norms utility classical test theory item response theory

Type: Chapter
Information: The Cambridge Handbook of Clinical Assessment and Diagnosis , pp. 9 - 24

DOI: https://doi.org/10.1017/9781108235433.002 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2019

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Book purchase

Temporarily unavailable

References

Achenbach, T. M. (2001). What are norms and why do we need valid ones? Clinical Psychology: Science and Practice, 8, 446–450.Google Scholar

AERA (American Educational Research Association), APA (American Psychological Association), & NCME (National Council on Measurement in Education). (2014). Standards for educational and psychological testing. Washington, DC: AERA.Google Scholar

Anastasi, A., & Urbina, S. (1997). Psychological testing (7th ed.). Upper Saddle River, NJ: Prentice-Hall.Google Scholar

Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561–573.Google Scholar

Andrich, D. (2004). Controversy and the Rasch model. Medical Care, 42, 1–10.Google Scholar

Arbisi, P. A., Ben-Porath, Y. S., & McNulty, J. (2002). A comparison of MMPI-2 validity in African American and Caucasian psychiatric inpatients. Psychological Assessment, 14, 3–15.Google Scholar

Baker, F. (2001). The basics of item response theory. College Park, MD: ERIC Clearinghouse on Assessment and Evaluation.Google Scholar

Barry, A. E., Chaney, B. H., Piazza-Gardner, A. K., & Chavarria, E. A. (2014). Validity and reliability reporting in the field of health education and behavior: A review of seven journals. Health Education and Behavior, 41, 12–18.Google Scholar

Ben-Porath, Y. S., & Tellegen, A. (2008). The Minnesota Multiphasic Personality Inventory – 2 Restructured Form: Manual for administration, scoring, and interpretation. Minneapolis: University of Minnesota Press.Google Scholar

Bingenheimer, J. B., Raudenbush, S. W., Leventhal, T., & Brooks-Gunn, J. (2005). Measurement equivalence and differential item functioning in family psychology. Journal of Family Psychology, 19, 441–455.CrossRef Google Scholar PubMed

Bock, R. D. (1972). Estimating item parameters and latent ability when responses are scored in two or more nominal categories. Psychometrika, 37, 29–51.Google Scholar

Brooks, B. L., Strauss, E., Sherman, E. M. S., Iverson, G. L., & Slick, D. J. (2009). Developments in neuropsychological assessment: Refining psychometric and clinical interpretive methods. Canadian Psychology, 50, 196–209.CrossRef Google Scholar

Bush, S. S., Ruff, R. M., Tröster, A. I., Barth, J. T., Koffler, S. P., Pliskin, N. H., Reynolds, C. R., & Silver, C. H. (National Academy of Neuropsychology Policy & Planning Committee). (2005). Symptom validity assessment: Practice issues and medical necessity. Archives of Clinical Neuropsychology, 20, 419–426.Google Scholar

Chmielewski, M., Clark, L. A., Bagby, R. M., & Watson, D. (2015). Method matters: Understanding diagnostic reliability in DSM-IV and DSM-5. Journal of Abnormal Psychology, 124, 764–769.Google Scholar

Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychological Assessment, 6, 284–290.CrossRef Google Scholar

Cronbach, L. J., Gleser, G. C., Nanda, H., & Rajaratnum, N. (1972). The dependability of behavioral measures: Theory of generalizability for scores and profiles. New York: John Wiley & Sons.Google Scholar

Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.Google Scholar

Dunn, T. J., Baguley, T., & Brunsden, V. (2014). From alpha to omega: A practical solution to the pervasive problem of internal consistency estimation. British Journal of Psychology, 105, 399–412.Google Scholar

Fariña, F., Redondo, L., Seijo, D., Novo, M., & Arce, R. (2017). A meta-analytic review of the MMPI validity scales and indexes to detect defensiveness in custody evaluations. International Journal of Clinical and Health Psychology, 17, 128–138.Google Scholar

Ferguson, G. A. (1942). Item selection by the constant progress. Psychometrika, 7, 19–29.Google Scholar

Fernandez, K., Boccaccini, M. T., & Noland, R. M. (2007). Professionally responsible test selection for Spanish-speaking clients: A four-step approach for identifying and selecting translated tests. Professional Psychology: Research and Practice, 38, 363–374.CrossRef Google Scholar

Fortney, J. C., Unützer, J., Wrenn, G., Pyne, J. M., Smith, G. R., Schoenbaum, M., & Harbin, H. T. (2017). A tipping point for measurement-based care. Psychiatric Services, 68, 179–188.Google Scholar

Fuentes, K., & Cox, B.J. (1997). Prevalence of anxiety disorders in elderly adults: A critical analysis. Journal of Behavior Therapy and Experimental Psychiatry, 28, 269–279.Google Scholar

Haynes, S. N., Richard, D. C. S., & Kubany, E. S. (1995). Content validity in psychological assessment: A functional approach to concepts and methods. Psychological Assessment, 7, 238–247.Google Scholar

Haynes, S. N., Smith, G., & Hunsley, J. (2019). Scientific foundations of clinical assessment (2nd ed.). New York: Taylor & Francis.Google Scholar

Henson, R., Kogan, L., & Vacha-Haase, T. (2001). A reliability generalization study of the Teacher Efficacy Scale and related instruments. Educational and Psychological Measurement, 61, 404–420.Google Scholar

Hogan, T. P. (2014). Psychological testing: A practical introduction (3rd ed.). Hoboken, NJ: John Wiley & Sons.Google Scholar

Hogan, T. P., Benjamin, A., & Brezinski, K. L. (2000). Reliability methods: A note on the frequency of use of various types. Educational and Psychological Measurement, 60, 523–531.Google Scholar

Hunsley, J., & Mash, E. J. (Eds.). (2018a). A guide to assessments that work. New York: Oxford University Press.CrossRef Google Scholar

Hunsley, J., & Mash, E. J. (2018b). Developing criteria for evidence-based assessment: An introduction to assessments that work. In Hunsley, J. & Mash, E. J. (Eds.), A guide to assessments that work (pp. 3–14). New York: Oxford University Press.CrossRef Google Scholar

Hunsley., J., & Meyer, G. J. (2003). The incremental validity of psychological testing and assessment: Conceptual, methodological, and statistical issues. Psychological Assessment, 15, 446–455.Google Scholar

Hurl, K., Wightman, J. K., Haynes, S. N., & Virués-Ortega, J. (2016). Does a pre-intervention functional assessment increase intervention effectiveness? A meta-analysis of within-subject interrupted time-series studies. Clinical Psychology Review, 47, 71–84.Google Scholar

Kendall, P. C., Marrs-Garcia, A., Nath, S. R., & Sheldrick, R. C. (1999). Normative comparisons for the evaluation of clinical significance. Journal of Consulting and Clinical Psychology, 67, 285–299.Google Scholar

Kieffer, K. M., & Reese, R. J. (2002). A reliability generalization study of the Geriatric Depression Scale (GDS). Educational and Psychological Measurement, 62, 969–994.Google Scholar

Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). The PHQ-9: The validity of a brief depression severity measure. Journal of General Internal Medicine, 16, 606–613.Google Scholar

Krueger, R. F., Derringer, J., Markon, K. E., Watson, D., & Skodol, A. E. (2012). Initial construction of a maladaptive personality trait model and inventory for DSM-5. Psychological Medicine, 42, 1879–1890.CrossRef Google Scholar PubMed

Lambert, M. J., & Shimokawa, K. (2011). Collecting client feedback. Psychotherapy, 48, 72–79.CrossRef Google Scholar PubMed

Lord, F. (1952). A theory of test scores. Richmond, VA: Psychometric Corporation.Google Scholar

Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149–174.Google Scholar

McGrath, R. E. (2001). Toward more clinically relevant assessment research. Journal of Personality Assessment, 77, 307–332.CrossRef Google Scholar PubMed

McGrath, R. E., Mitchell, M., Kim, B. H., & Hough, L. (2010). Evidence for response bias as a source of error variance in applied assessment. Psychological Bulletin, 136, 450–470.Google Scholar

McGrew, K. S., LaForte, E. M., & Schrank, F. A. (2014). Technical manual: Woodcock-Johnson IV. Rolling Meadows, IL: RiversideGoogle Scholar

Merten, T., Dandachi-FitzGerald, B., Hall, V., Schmand, B. A., Santamaría, P., & González-Ordi, H. (2013). Symptom validity assessment in European countries: Development and state of the art. Clínica y Salud, 24, 129–138.Google Scholar

Messick, S. (1995). Validity of psychological assessment: Validation of inferences from persons’ responses and performances as scientific inquiry into score meaning. American Psychologist, 50, 741–749.CrossRef Google Scholar

Milfont, T. L., & Fischer, R. (2010). Testing measurement invariance across groups: Applications in cross-cultural research. International Journal of Psychological Research, 3, 111–121.Google Scholar

Miller, C. S., Kimonis, E. R., Otto, R. K., Kline, S. M., & Wasserman, A. L. (2012). Reliability of risk assessment measures used in sexually violent predator proceedings. Psychological Assessment, 24, 944–953.Google Scholar

Morash, V. S., & McKerracher, A. (2017). Low reliability of sighted-normed verbal assessment scores when administered to children with visual impairments. Psychological Assessment, 29, 343–348.Google Scholar

Morey, L. C. (1991). The Personality Assessment Inventory professional manual. Odessa, FL: Psychological Assessment Resources.Google Scholar

Moskowitz, D. S., Russell, J. J., Sadikaj, G., & Sutton, R. (2009). Measuring people intensively. Canadian Psychology, 50, 131–140.Google Scholar

Muraki, E. (1990). Fitting a polytomous item response model to Likert-type data. Applied Psychological Measurement, 14, 59–71.Google Scholar

Muraki, E. (1992). A generalized partial credit model: application of an EM algorithm. Applied Psychological Measurement, 16, 159–176.Google Scholar

Murphy, K. R., & Davidshofer, C. O. (2005). Psychological testing: Principles and applications (6th ed.). New York: Pearson.Google Scholar

Nelson-Gray, R. O. (2003). Treatment utility of psychological assessment. Psychological Assessment, 15, 521–531.Google Scholar

Newton, P. E., & Shaw, S. D. (2013). Standards for talking and thinking about validity. Psychological Methods, 18, 301–319.CrossRef Google Scholar PubMed

Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York: McGraw-Hill.Google Scholar

Paulhus, D. L. (1998). Manual for the Paulhus Deception Scales: BIDR Version 7. Toronto: Multi-Health Systems.Google Scholar

Reisse, S. P., & Revicki, D. A. (Eds.). (2015). Handbook of item response theory modeling. New York: Routledge.Google Scholar

Revelle, W., & Zinbarg, R.E. (2009). Coefficients alpha, beta, omega and the glb: Comments on Sijtsma. Psychometrika, 74, 145–154.Google Scholar

Rodebaugh, T. L., Sculling, R. B., Langer, J. K., Dixon, D. J., Huppert, J. D., Bernstein, A., … Lenze, E. J. (2016). Unreliability as a threat to understanding psychopathology: The cautionary tale of attentional bias. Journal of Abnormal Psychology, 125, 840–851.Google Scholar

Rohling, M. L., Larrabee, G. J., Greiffenstein, M. F., Ben-Porath, Y. S., Lees-Haley, P., Green, P., & Greve, K. W. (2011). A misleading review of response bias: Response to McGrath, Mitchell, Kim, & Hough (2010). Psychological Bulletin, 137, 708–712.Google Scholar

Rousse, S. V. (2007). Using reliability generalization methods to explore measurement error: An illustration using the MMPI-2 PSY-5 scales. Journal of Personality Assessment, 88, 264–275.Google Scholar

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. www.psychometrika.org/journal/online/MN17.pdf Google Scholar

Schmidt, F. L., & Hunter, J. E. (1977). Development of a general solution to the problem of validity generalization. Journal of Applied Psychology, 62, 529–540.CrossRef Google Scholar

Sechrest, L. (1963). Incremental validity: A recommendation. Educational and Psychological Measurement, 23, 153–158.Google Scholar

Smid, W. J., Kamphuis, J. H., Wever, E. C., & Van Beek, D. J. (2014). A comparison of the predictive properties of the nine sex offender risk assessment instruments. Psychological Assessment, 26, 691–703.Google Scholar

Smith, G. T., Fischer, S., & Fister, S. M. (2003). Incremental validity principles in test construction. Psychological Assessment, 15, 467–477.Google Scholar

Stanley, D. J., & Spence, J. R. (2014). Expectations for replications: Are yours realistic? Perspectives on Psychological Science, 9, 305–318.CrossRef Google Scholar PubMed

Strauss, M. E., & Smith, G. T. (2009). Construct validity: Advances in theory and methodology. Annual Review of Clinical Psychology, 5, 89–113.Google Scholar

Streiner, D. L. (2003a). Starting at the beginning: An introduction to coefficient alpha and internal consistency. Journal of Personality Assessment, 80, 99–103.Google Scholar

Streiner, D. L. (2003b). Diagnosing tests: Using and misusing diagnostic and screening tests. Journal of Personality Assessment, 81, 209–219.Google Scholar

Tam, H. E., & Ronan, K. (2017). The application of a feedback-informed approach in psychological service with youth: Systematic review and meta-analysis. Clinical Psychology Review, 55, 41–55.Google Scholar

Teglasi, H. (2010). Essentials of TAT and other storytelling assessments (2nd ed.). Hoboken, NJ: Wiley.Google Scholar

Therrien, Z., & Hunsley, J. (2012). Assessment of anxiety in older adults: A systematic review of commonly used measures. Aging and Mental Health, 16, 1–16.Google Scholar

Therrien, Z., & Hunsley, J. (2013). Assessment of anxiety in older adults: A reliability generalization meta-analysis of commonly used measures. Clinical Gerontologist, 36, 171–194.Google Scholar

Tombaugh, T. N. (1996). The Test of Memory Malingering. Toronto: Multi-Health Systems.Google Scholar

Vacha-Haase, T. (1998). Reliability generalization exploring variance in measurement error affecting score reliability across studies. Educational and Psychological Measurement, 58, 6–20.Google Scholar

Vacha-Haase, T., Henson, R., & Caruso, J. (2002). Reliability generalization: Moving toward improved understanding and use of score reliability. Educational and Psychological Measurement, 62, 562–569.Google Scholar

Vacha-Haase, T., & Thompson, B. (2011). Score reliability: A retrospective look back at 12 years of reliability generalization studies. Measurement and Evaluation in Counseling and Development, 44, 159–168.Google Scholar

van de Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9, 486–492.Google Scholar

van der Linden, W. J. (Ed.). (2016a). Handbook of item response theory, Vol. 1. Boca Raton, FL: CRC Press.Google Scholar

van der Linden, W. J. (Ed.). (2016b). Handbook of item response theory, Vol. 2. Boca Raton, FL: CRC Press.CrossRef Google Scholar

Wasserman, J. D., & Bracken, B. A. (2013). Fundamental psychometric considerations in assessment. In Graham, J. R. & Naglieri, J. A. (Eds.), Handbook of psychology. Vol. 10: Assessment psychology (2nd ed., pp. 50–81). Hoboken, NJ: John Wiley & Sons.Google Scholar

Weisz, J. R., Chorpita, B. F., Frye, A., Ng, M. Y., Lau, N., Bearman, S. K., & Hoagwood, K. E. (2011). Youth Top Problems: Using idiographic, consumer-guided assessment to identify treatment needs and to track change during psychotherapy. Journal of Consulting and Clinical Psychology, 79, 369–380.Google Scholar

Wiggins, C. W., Wygant, D. B., Hoelzle, J. B., & Gervais, R. O. (2012). The more you say the less it means: Over-reporting and attenuated criterion validity in a forensic disability sample. Psychological Injury and Law, 5, 162–173.Google Scholar

Wood, J. M., Garb, H. N., & Nezworski, M. T. (2006). Psychometrics: Better measurement makes better clinicians. In Lilienfeld, S. O. & O’Donohue, W. T. (Eds.), The great ideas of clinical science: The 17 concepts that every mental health practitioner should understand (pp. 77–92). New York: Brunner-Routledge.Google Scholar

Wright, A. G. C., & Simms, L. J. (2014). On the structure of personality disorder traits: Conjoint analyses of the CAT-PD, PID-5, and NEO-PI-3 Trait Models. Personality Disorders: Theory, Research, and Treatment, 5, 43–54.CrossRef Google Scholar PubMed

Xu, S., & Lorber, M. F. (2014). Interrater agreement statistics with skewed data: Evaluation of alternatives to Cohen’s kappa. Journal of Consulting and Clinical Psychology, 82, 1219–1227.Google Scholar

Youngstrom, E. A., Van Meter, A., Frazier, T. W., Hunsley, J., Prinstein, M. J., Ong, M.-L., & Youngstrom, J. K. (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology: Science and Practice, 24, 331–363.Google Scholar