Overview
In their focal article, Sackett et al. (Reference Sackett, Zhang, Berry and Lievensin press) describe implications of their new meta-analytic estimates of validity of widely used predictors for selection of employees. Contradicting the received wisdom of Schmidt and Hunter (Reference Schmidt and Hunter1998), Sackett et al. conclude that predictor methods with content specifically tailored to jobs generally have greater validity for predicting job performance than general measures reflecting psychological constructs (e.g., cognitive abilities, personality traits). They also point out that standard deviations around the mean of their meta-analytic validity estimates are often large, leading to their question “why the variability?” (p. x). They suggest many legitimate contributors.
We propose an additional moderator variable of critical importance: predictor-criterion construct congruence, accounting for a great deal of variability in validity coefficients found in meta-analysis. That is, the extent to which what is measured is congruent with what is predicted is an important determinant of the level of validity obtained. Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) acknowledge that the strongest predictors in their re-analysis are job-specific measures and that a “closer behavioral match between predictor and criterion” (p. 2062) might contribute to higher validities. Many in our field have also noted the importance of “behavioral consistency” between predictors and criteria relevant to selection, while also arguing for another type of congruence: the relationships between constructs in both the predictor and criterion space (e.g., Bartram, Reference Bartram2005; Campbell et al., Reference Campbell, McCloy, Oppler, Sager, Schmitt and Borman1993; Campbell & Knapp, Reference Campbell and Knapp2001; Hogan & Holland, Reference Hogan and Holland2003; Hough, Reference Hough1992; Hough & Oswald, Reference Hough and Oswald2005; Pulakos et al., Reference Pulakos, Borman and Hough1988; Sackett & Lievens, Reference Sackett and Lievens2008; Schmitt & Ostroff, Reference Schmitt and Ostroff1986).
The above reflects an important distinction between two types of congruence: behavior-based congruence and construct-based congruence. When ‘past behavior predicts future behavior’ (as might be possible for jobs requiring past experience and where behavior-oriented employment assessments such as interviews, biodata, and work samples are involved), behavior-based congruence exists. Behavior-based assessments can vary a great deal across jobs but tend to ask about past experiences that are influenced by a complex mix of KSAOs. By contrast, construct-based congruence aligns employment tests of job-relevant KSAOs (e.g., verbal and math skills, conscientiousness) with relevant work criteria, such as technical performance or counterproductive work behavior (e.g., Campbell & Wiernik, Reference Campbell and Wiernik2015).
What we are suggesting strongly here is that regardless of the approach to congruence adopted in selection, it is the congruence between predictor and criterion constructs that is a key factor influencing the levels of validity found across all personnel selection assessment tools in the Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022) meta-analysis. I–O psychologists have internalized that methods are not the same as constructs (Arthur & Villado, Reference Arthur and Villado2008); virtually any KSAO can be measured through methods such as the interview, biodata, and work samples. If the conclusion is “the structured interview tends to be the most valid predictor in selection contexts,” it is not only because of the structure. Instead, it is ideally due to behavior-based congruence, such that what is being measured in the structured interview is tapping the same KSAOs that are measured in the performance criterion. This congruence is much more likely to happen when a selection system is developed based on job analysis of relevant KSAOs (Morgeson et al., Reference Morgeson, Brannick and Levine2019; Steel et al., Reference Steel, Huffcutt and Kammeyer-Mueller2006); it is much less likely to happen otherwise. A lack of predictor–criterion congruence is an important source of error contributing to meta-analytic correlations, where correlations could increase (due to method variance such as likability in both the interview and in performance ratings; Schmitt et al., Reference Schmitt, Pulakos, Nason and Whitney1996) or correlations could decrease (due to KSAO incongruence).
In an ever-changing world of work, understanding relationships between predictor and criterion constructs is key to developing highly valid selection systems quickly; validities are useful when they can be transportable and generalizable to new job. In addition to what can be properly inferred from meta-analyses such as Sackett et al. (Reference Sackett, Zhang, Berry and Lievens2022), synthetic validation is the foundation upon which such systems can be built. Highly valid selection systems can be designed intentionally, efficiently, and effectively by reviewing meta-analytic validity evidence to synthetically estimate validity for the situation at hand. In a world where selection systems are subject to government regulations, and social justice requires that those systems be fair to all, strategies that rely on known inputs and produce known outcomes such as through predictor–criterion construct congruence as enabled through synthetic validity strategies are key to successful design of selection systems. Artificial intelligence talent management solutions run the risk of black-box empiricism, where the data and/or the underlying algorithm might be opaque and resistant to the construct-level understanding afforded by congruence-based approaches.
In the pages that follow, we provide representative evidence supporting our contention that predictor–criterion congruence is an important determinant of the magnitudes of validity.
Interview: structured and unstructured
Sackett et al. found that the operational mean validity of the unstructured interview is 0.19, whereas the validity of structured interviews is 0.42, the highest level of validity of all the predictors they studied. Huffcutt et al. (Reference Huffcutt, Conway, Roth and Stone2001) examined meta-analytically the constructs measured in employment interviews and stated that “at least part of the reason why structured interviews tend to have higher validity is because they focus more on constructs that have a stronger relationship with job performance.” (p. 897). Our point precisely: When there is a correspondence between the substance of what is measured by the criterion measure and the substance of what is measured by the predictor measure, validity will be higher. For example, the critical incident approach to job analysis results in constructs based on work behaviors that readily become the basis for developing behavioral-based and situation–judgment interviews. Both types of interviews are structured interviews, and both have high criterion-related validity (e.g., Huffcutt et al., Reference Huffcutt, Culbertson and Weyhrauch2014).
Empirically keyed biodata
Sackett et al. (Reference Sackett, Zhang, Berry and Lievensin press) report the mean validity of empirically keyed biodata inventories as 0.38 and the mean validity of rationally keyed biodata as 0.22. These updated findings are based on the meta-analysis of Speer et al. (Reference Speer, Tenbrink, Wegmeyer, Sendra, Shihadeh and Kaur2022), who explained that “when biodata scores were correlated with theoretically aligned performance ratings, rational scoring resulted in similar validity coefficients as empirical scoring” (p. 1678). Hough and Paullin (Reference Hough, Paullin, Stokes, Mumford and Owens1994) also arrived at a similar conclusion when comparing biodata measures developed by rational (theoretical) and empirical means against the same criteria. In short, when predictor–criterion construct congruence is high, validity coefficients are not only higher, they stand to be more consistent across samples (validity coefficients that are less likely to capitalize on chance).
Work samples and assessment centers
Sackett et al. (Reference Sackett, Zhang, Berry and Lievensin press) report the mean validity of work samples as 0.33. The updated, appropriately corrected mean validity of assessment center was also reported as 0.33. Clearly, the development of work samples is based on an understanding of the job and the requirements of the work, and Sackett et al., point out that assessment centers are work samples for managerial jobs. It goes without saying, but we will: predictor–criterion construct congruence is high for both work samples and assessment centers.
Situational judgment tests
Both types of situational judgment tests for which Sackett et al. (Reference Sackett, Zhang, Berry and Lievensin press) separately report validities, i.e., knowledge and behavioral tendency (validities of 0.26 for both), are also typically developed to measure job requirements and work situations that are based on job/work analyses. As noted above, situational judgment tests and behavior-based (structured) interviews are often developed using the critical incident method for job analysis and can be considered job simulations, albeit low-fidelity job simulations (McDaniel & Nguyen, Reference McDaniel and Nguyen2001; Motowidlo et al., Reference Motowidlo, Dunnette and Carter1990). Like us, others also suggest that one reason situational judgments sometimes do not correlate with performance is lack of predictor–criterion congruence (e.g., Whetzel & Reeder, Reference Whetzel and Reeder2016). One example of evidence for this point of view comes from research on the criterion-related validity of situational judgment tests for predicting performance in medical school. Such research indicates that interpersonally oriented situation judgment tests predict performance in patient care-oriented medical schools, but not in basic science-oriented medical schools (Lievens et al., Reference Lievens, Buyse and Sackett2005). This finding allows us to echo our general point: The greater the predictor-criterion construct congruence, the higher the validity.
General mental ability (GMA)
Perhaps the most important finding in the Sackett et al., meta-analysis is that the mean validity coefficient for general mental ability (GMA) tests predicting overall job performance is lower than that from previous meta-analyses. Sackett et al. suggest one possibility for these lower validities is due to the greater importance of interpersonal skills and team-based work of today’s work, as compared with manufacturing type jobs included in older meta-analyses. If that is the case, this of course supports our thematic point that the requirements of the work need to be understood and predictor–criterion constructs aligned. Nye et al. (Reference Nye, Ma and Wee2022) meta-analyzed the differential validity of narrow cognitive abilities for predicting diverse criteria. Although they found incremental validity of narrow cognitive abilities over general mental ability, they found that the match between the narrow ability and job tasks did not have a substantial effect on validity. They state that a possible explanation is that the “breadth of each of the specific job performance dimensions assessed (e.g., task performance or organizational citizenship behavior) was incompatible with the narrower cognitive abilities” (p. 1136). Perhaps the complex performance of employees—and the overly simplistic ratings of their supervisors—is often unavoidable in an organization. We hope this state of affairs does not discourage efforts at measurement of criteria better reflecting specific abilities, shedding greater light on the benefits of measuring specific versus general abilities.
Personality variables
Nowhere in the research literature is there more evidence of the importance of predictor–criterion construct congruence than in the study of personality variables. First, Sackett et al. (Reference Sackett, Zhang, Berry and Lievensin press) separate personality measures of variables, such as Conscientiousness, according to those measures that are contextualized for work settings and those that are not. In a related vein, evidence that validities are higher for personality variables that are theoretically linked to the criteria has been shown for managerial work (Bartram, Reference Bartram2005) as well as for a variety of other jobs and criterion constructs including teamwork, counterproductive behavior, organizational citizenship, creativity, and work engagement (e.g., Hogan & Holland, Reference Hogan and Holland2003; Hough et al., Reference Hough, Eaton, Dunnette, Kamp and McCloy1990; Hough & Oswald, Reference Hough, Oswald, Meyer and Schneider2021; Oswald & Hough, Reference Oswald and Hough2008). Research on predictor–criterion construct congruity at the facet level of personality (more refined than the Big Five) continues due to producing informative patterns of convergent and discriminant validity (Hough & Oswald, Reference Hough and Oswald2005; Hough & Johnson, Reference Hough, Johnson, Weiner, Schmitt and Highhouse2013; Judge et al., Reference Judge, Rodell, Klinger and Simon2013; Steel et al., Reference Steel, Schmidt, Bosco and Uggerslev2019). To advance this work we need to continue investigating refined taxonomic structures of personality, as the literature is now producing and beginning to align with work-related outcomes (e.g., Soto et al., Reference Soto, Napolitano, Sewell, Yoon and Roberts2022). As Schneider et al. (Reference Schneider, Hough and Dunnette1996) stated eloquently, “Increasing use of narrower personality and job performance constructs, in concert with construct-oriented methodology, will greatly enrich I-O personality research. If we limit ourselves only to broad traits and general laws, we may find that we have fatally cut ourselves on the blade of Occam’s razor” (p. 653).
Implications for personnel selection and I-O psychology
Sackett et al. have provided a very important correction to our understanding of the level of validity of widely used predictors in personnel selection systems. However, rank ordering predictors according to their level of validity for predicting overall job performance is somewhat unhelpful, especially if very important moderators of predictor–criterion validities are left on the table (e.g., jobs, settings, samples). Sackett et al. also suggest caution when comparing findings across meta-analyses without clear understanding of the specific components underlying performance ratings—as we outlined in the behavior-based and construct-based congruence approaches to selection.
Although we can never carve nature at its exact joints, we can build useful hierarchical taxonomies of constructs at different levels of refinement, such as that found in the metaBUS (www.metabus.org; Bosco et al., Reference Bosco, Steel, Oswald, Uggerslev and Field2015; Bosco et al., Reference Bosco, Uggerslev and Steel2017) extensive organizational database of correlational relationships between predictors and criteria. Complementing this taxonomic structure could be the nomological-web clustering of predictor-criterion relationships, as described by Hough et al. (Reference Hough, Oswald and Ock2015) for personality variables. Taxonomies allow us to use meta-analysis and synthetic validity in flexible ways to estimate the utility of newly developed prediction systems for new and newly configured jobs. Although selection researchers have yet to adopt a standard taxonomy such as that offered by metaBUS, we still are further along in understanding predictor and criterion constructs, relationships, and pertinent moderators at a more refined level than ever before. And we are ready do that, because as I–O psychologists, we now embrace the criterion as multidimensional, so much so that it is almost hard to appreciate that when Guion (Reference Guion1961) and Dunnette (Reference Dunnette1963) admonished us against focusing on the “ultimate” criterion, there was a reason. We still need to continue pushing ahead in refining our thinking, our constructs, and our selection research even further. We should not rest our laurels on meta-analytic estimates that come with an unavoidable heterogeneity of samples and settings—and last but not least, jobs.