We use cookies to distinguish you from other users and to provide you with a better experience on our websites. Close this message to accept cookies or find out how to manage your cookie settings.
To save content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about saving content to .
To save content items to your Kindle, first ensure [email protected]
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about saving to your Kindle.
Note you can select to save to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
In this chapter we review advanced psychometric methods for examining the validity of self-report measures of attitudes, beliefs, personality style, and other social psychological and personality constructs that rely on introspection. The methods include confirmatory-factor analysis to examine whether measurements can be interpreted as meaningful continua, and measurement invariance analysis to examine whether items are answered the same way in different groups of people. We illustrate the methods using a measure of individual differences in openness to political pluralism, which includes four conceptual facets. To understand how the facets relate to the overall dimension of openness to political pluralism, we compare a second-order factor model and a bifactor model. We also check to see whether the psychometric patterns of item responses are the same for males and females. These psychometric methods can both document the quality of obtained measurements and inform theorists about nuances of their constructs.
The validity of conclusions drawn from specific research studies must be evaluated in light of the purposes for which the research was undertaken. We distinguish four general types of research: description and point estimation, correlation and prediction, causal inference, and explanation. For causal and explanatory research, internal validity is critical – the extent to which a causal relationship can be inferred from the results of variation in the independent and dependent variables of an experiment. Random assignment is discussed as the key to avoiding threats to internal validity. Internal validity is distinguished from construct validity (the relationship between a theoretical construct and the methods used to operationalize that concept) and external validity (the extent to which the results of a research study can be generalized to other contexts). Construct validity is discussed in terms of multiple operations and discriminant and convergent validity assessment. External validity is discussed in terms of replicability, robustness, and relevance of specific research findings.
Governments and social scientists are increasingly developing machine learning methods to automate the process of identifying terrorists in real-time and predict future attacks. However, current operationalizations of ‘terrorist’ in artificial intelligence are difficult to justify given three issues that remain neglected: insufficient construct legitimacy, insufficient criterion validity, and insufficient construct validity. I conclude that machine learning methods should be at most used for the identification of singular individuals deemed terrorists and not for identifying possible terrorists from some more general class, nor to predict terrorist attacks more broadly, given intolerably high risks that result from such approaches.
Lakshmi Balachandran Nair, Libera Università Internazionale degli Studi Sociali Guido Carli, Italy,Michael Gibbert, Università della Svizzera Italiana, Switzerland,Bareerah Hafeez Hoorani, Radboud University Nijmegen, Institute for Management Research, The Netherlands
We introduce and define the single holistic case study design in this chapter. The strengths of the design are discussed in detail, with examples. In particular, we discuss the potential of single holistic design in providing a detailed explanation of processes. Single holistic case studies also explore the theorizing potential of unique cases which hold the potential to reveal new dimensions of a phenomenon or falsify/refute an existing theory. Relatively high data access, construct validity, potential to include an unlimited number of variables, etc., are some other strengths that we discuss. The weaknesses of the design (i.e. low internal and external validity) are discussed afterwards. The chapter also addresses some common (mis)conceptions regarding single holistic designs and their external validity.
Sound general and sports nutrition knowledge in athletes is essential for making appropriate dietary choices. Assessment of nutrition knowledge enables evaluation and tailoring of nutrition education. However, few well-validated tools are available to assess nutrition knowledge in athletes. The objective of the present study was to establish the validity of the Platform to Evaluate Athlete Knowledge Sports – Nutrition Questionnaire (PEAKS-NQ) for use in the United Kingdom and Irish (UK-I) athletes. To confirm content validity, twenty-three sports nutritionists (SNs) from elite, UK-I sports institutes provided feedback on the PEAKS-NQ via a modified Delphi method. After minor changes, the UK-I version of the PEAKS-NQ was administered to UK-I SN from the British Dietetic Association Sport and Exercise Nutrition Register, and elite athletes (EA) training at elite sports institutes in the UK and Ireland. Independent samples t-test and independent samples median tests were used to compare PEAKS-NQ total and subsection scores between EA and SN (to assess construct validity). Cronbach's alpha (good ≥ 0⋅7) was used to establish internal consistency. The SN achieved greater overall [SN (n 23) 92⋅3 (9⋅3) v. EA (n 154): 71⋅4 (10⋅0)%; P < 0⋅001] and individual section scores (P < 0⋅001) except Section B, Identification of Food Groups (P = 0⋅07). Largest knowledge differences between SN and EA were in Section D, Applied Sports Nutrition [SN: 88⋅5 (8⋅9) v. EA: 56⋅7 (14⋅5)%; P < 0⋅00]. Overall ES was large (2⋅1), with subsections ranging from 0⋅6 to 2⋅3. Cronbach's alpha was good (0⋅83). The PEAKS-NQ had good content and construct validity, supporting its use to assess nutrition knowledge of UK-I athletes.
Psychopathologists have failed to make significant progress toward understanding the causes of psychopathology. Despite the foundational importance of construct validity and measurement to our field, insufficient attention is paid to these concerns in the assessment of psychopathology vulnerabilities prior to their implementation in causal models. I review the current state of construct validity and measurement in psychopathology research, highlighting the lack of consensus regarding how we should define and measure vulnerability constructs. The limited capacity of open science practices to address these definitional and measurement challenges is discussed. Recommendations for progress are made, including the need for consensus agreement on (1) working definitions and (2) measures of vulnerability constructs. Other recommendations include (3) the need to incentivize ‘pre-clinical’ descriptive work focused on measurement development, (4) the formation of open-access databases designed to facilitate measurement evaluation and development, and (5) increased exploration of the use of novel technologies to facilitate the collection of high-quality measures of vulnerability.
To evaluate the construct validity of the NIH Toolbox Cognitive Battery (NIH TB-CB) in the healthy oldest-old (85+ years old).
Method:
Our sample from the McKnight Brain Aging Registry consists of 179 individuals, 85 to 99 years of age, screened for memory, neurological, and psychiatric disorders. Using previous research methods on a sample of 85 + y/o adults, we conducted confirmatory factor analyses on models of NIH TB-CB and same domain standard neuropsychological measures. We hypothesized the five-factor model (Reading, Vocabulary, Memory, Working Memory, and Executive/Speed) would have the best fit, consistent with younger populations. We assessed confirmatory and discriminant validity. We also evaluated demographic and computer use predictors of NIH TB-CB composite scores.
Results:
Findings suggest the six-factor model (Vocabulary, Reading, Memory, Working Memory, Executive, and Speed) had a better fit than alternative models. NIH TB-CB tests had good convergent and discriminant validity, though tests in the executive functioning domain had high inter-correlations with other cognitive domains. Computer use was strongly associated with higher NIH TB-CB overall and fluid cognition composite scores.
Conclusion:
The NIH TB-CB is a valid assessment for the oldest-old samples, with relatively weak validity in the domain of executive functioning. Computer use’s impact on composite scores could be due to the executive demands of learning to use a tablet. Strong relationships of executive function with other cognitive domains could be due to cognitive dedifferentiation. Overall, the NIH TB-CB could be useful for testing cognition in the oldest-old and the impact of aging on cognition in older populations.
In this paper, we evaluate the factorial validity of the Spanish short version of the Utrecht Work Engagement Scale (UWES–9) and assess its predictive validity with respect to self-assessed work performance. A total of 229 employees from educational institutions in Ecuador participated. Using a model comparison analysis, the unidimensional model exhibited an excellent goodness of fit, χ2 = 26.176 (24), p = .344; CFI =1.000; TLI = 1.000; RMSEA = .020; SRMR = .034; it was not improved by more complex models, Three-factor model: χ2 = 22.148 (21), p = .391; CFI =1.000; TLI = 1.000; RMSEA = .016; SRMR = .033. Two-factor model: χ2 = 26.080 (23), p = .297; CFI = 1.000; TLI = 1.000; RMSEA = .025; SRMR = .034). Therefore, it is justified as a unidimensional instrument of work engagement. However, upon analyzing the correlation patterns of the overall score and the work engagement dimensions in relation to the task performance, contextual performance, and counterproductive behaviors, we conclude that, while the unidimensional model exhibits a good fit, the three-factor theoretical approach is substantively superior in that it maintains differential predictive validity for each theoretical dimension.
Rowland Universal Dementia Assessment Scale (RUDAS) is a brief cognitive test, appropriate for people with minimum completed level of education and sensitive to multicultural contexts. It could be a good instrument for cognitive impairment (CI) screening in Primary Health Care (PHC). It comprises the following areas: recent memory, body orientation, praxis, executive functions and language.
Research Objective:
The objective of this study is to assess the construct validity of RUDAS analysing its internal consistency and factorial structure.
Method:
Internal consistency will be calculated using ordinal Cronbach’s α, which reflects the average inter-item correlation score and, as such, will increase when correlations between the items increase. Exploratory Factor Analysis will be used to arrange the variables in domains using principal components extraction. The factorial analysis will include the extraction of five factors reflecting the neuropsychological areas assessed by the test. The result will be rotated under Varimax procedure to ease interpretation.
Exploratory factor analysis will be used to arrange the variables in domains using principal components extraction. The analysis will include Kaiser–Meyer–Olkin measure of sampling adequacy and Bartlett’s test of sphericity. Estimations will be based based on Pearson’s correlations between indicators using a principal component analysis and later replicated with a tetrachoric correlation matrix. The variance in the tetrachoric model will be analysed to indentify convergent iterations and their explicative power.
Preliminary results of the ongoing study:
RUDAS is being administered to 321 participants older than 65 years, from seven PHC physicians’ consultations in O Grove Health Center. The data collection will be finished by August 2021 and in this poster we will present the final results of the exploratory factor analysis.
Conclusions:
We expect that the results of the exploratory factor analysis will replicate the results of previous studies of construct validity of the test in which explanatory factor weights were between 0.57 and 0.82, and all were above 40%. Confirming that RUDAS has a strong factor construct with high factor weights and variance ratio, and 6-item model is appropriate for measurement will support its recommendation as a valid screening instrument for PHC.
Delay discounting paradigms have gained widespread popularity across clinical research. Given the prevalence in the field, researchers have set lofty expectations for the importance of delay discounting as a key transdiagnostic process and a ‘core’ process underlying specific domains of dysfunction (e.g. addiction). We believe delay discounting has been prematurely reified as, in and of itself, a core process underlying psychological dysfunction, despite significant concerns with the construct validity of discounting rates. Specifically, high delay discounting rates are only modestly related to measures of psychological dysfunction and therefore are not ‘core’ to these more complex behavioral problems. Furthermore, discounting rates do not appear to be specifically related to any disorder(s) or dimension(s) of psychopathology. This raises fundamental concerns about the utility of discounting, if the measure is only loosely associated with most forms of psychopathology. This stands in striking contrast to claims that discounting can serve as a ‘marker’ for specific disorders, despite never demonstrating adequate sensitivity or specificity for any disorder that we are aware of. Finally, empirical evidence does not support the generalizability of discounting rates to other decisions made either in the lab or in the real-world, and therefore discounting rates cannot and should not serve as a summary measure of an individual's decision-making patterns. We provide recommendations for improving future delay discounting research, but also strongly encourage researchers to consider whether the empirical evidence supports the field's hyper-focus on discounting.
This chapter examines experimental treatments and the theoretical, practical and empirical issues involved in their implementation. I begin by discussing the underlying purpose of experimental treatments. Second, I address what it means to say that a treatment has generalizable effects. Third, I discuss practical issues involved in constructing treatments in a variety of contexts including written, spoken, visual, and behavioral interventions. In the fourth section, I highlight the importance of validating that experimental treatments have induced the intended differences by experimental condition in the independent variable. I point to the general neglect of manipulation checks in experiments in political science and emphasize what can be learned through their inclusion. Contemporary publications provide some evidence of confusion among political scientists about the purposes for which manipulation checks and attention checks are appropriate. In the fifth and final section, I highlight the need for political scientists to move beyond between-subject assignment of treatments to consider far more powerful within-subject and hybrid experimental treatments.
Reliable and valid assessment of sports nutrition knowledge can inform athlete nutrition education to address knowledge gaps. This study aimed to test the reliability and validity of an electronically administered sports nutrition knowledge tool – Platform to Evaluate Athlete Knowledge of Sports Nutrition Questionnaire (PEAKS-NQ). A 94-item PEAKS-NQ was piloted to 149 developmental athletes (DA) in New Zealand, with a subset invited to complete the PEAKS-NQ again to assess reliability. Reliability was evaluated using sign test, intraclass correlation and Cronbach’s α. Accredited sports dietitians (ASD; n 255) completed the PEAKS-NQ to establish construct validity via known-groups methodology and provided relevance scores to determine the scale content validity index (S-CVI). Rasch analysis was conducted to identify potentially problematic items and test reliability. Score differences between DA and ASD were analysed using independent t or non-parametric tests. DA (n 88) were 17·8 (sd 1·4) years, 61·4 % female and mostly in high school (94·3 %). ASD (n 45) were 37·8 (sd 7·6) years, 82·2 % female, with >5 years of dietetic experience (59·1 %). ASD scored higher than DA in all sections and overall (91·5 (sd 3·4) v. 67·1 (sd 10·5) %) (P < 0·001). There were no differences between retests (n 18; P = 0·14). Cronbach’s α was 0·86. S-CVI indicated good content validity (0·88). Rasch analysis resulted in a fifty-item PEAKS-NQ with high item (0·91) and person (0·92) reliability. The PEAKS-NQ is reliable and valid for assessing sports nutrition knowledge which could assist practitioners effectively tailor and evaluate nutrition education.
Flory (this volume) provides a compelling review of evidence bearing on the reliability and validity of diagnostic interviews for personality disorders (PDs). This commentary discusses several issues central to this topic, among the most important of which are: (1) the importance of distinguishing PD categories and constructs from the measures used to quantify them; and (2) the need to separate critiques of overarching conceptual frameworks (e.g., the dimensional perspective on personality) from criticisms of narrower assessment rubrics (e.g., the Five-Factor Model). Given the introspective limitations inherent in human information processing—limitations which are magnified in many forms of personality pathology—rigorous validation of PD assessment tools requires that researchers complement self-report outcome measures with behavioral and performance-based indices of personality dysfunction. To illuminate causal relationships among different features of personality pathology researchers must use experimental methods to alter PD-related psychological processes and assess the impact of these manipulations on affect, cognition, and behavior.
The authors of this commentary aim to expand on particular points covered in the chapter by Evans, Williams and Simms, and discuss other issues that were not covered there. First, they discuss future research directions for, and the potential utility of, multisource assessments of personality pathology. Second, they emphasize the need for aspects of clinical utility within some of the reviewed measures (e.g., norms, validity scales). Third, they discuss the need for further examinations into the feasibility and utility of longitudinal assessments of personality pathology (e.g., dynamics in the context of treatment). Fourth, they describe two recent measures of personality pathology that warrant further validation. Lastly, they emphasize the need for a conceptual and measurement-based consensus regarding the multidimensional nature of personality pathology as a whole.
The paper discusses the relevance of sufficient psychometric standards for dementia rating scales. The concurrent, convergent and construct validity of the Mini Mental State Examination (MMSE), the Alzheimer's Disease Assessment Scale (ADAS) and the CAMCOG are assessed. The Clinical Global Impressions and the Global Deterioration Scale are used as global scales. The concurrent and convergent validity are satisfactory. The construct validity expressed by the Cronbach and Loevinger coefficient are very good for all scales and subscales. The Mokken's single item coefficients show that the MMSE has the best individual hierarchical fit, the item reading can be left out. The ADAS is less uni-dimensional, eight items can be left out. The CAMCOG consists of too many items to apply the Mokken's single item coefficients or the Loevinger coefficient. Instead, the CAMCOG subscales are analyzed. This results in a possible reduction of the CAMCOG by 30 items to a total of 35 items. The factor analysis reveals two factors in both the MMSE and the ADAS while the number of observations does not allow a factor analysis of the CAMCOG to be performed.
As in many sciences, description is an important component of theory, research, and practical applications in clinical psychology. Despite this, considerable disagreement exists regarding how to describe the diverse manifestations of psychopathology that clinicians and researchers have observed. The disagreements are such that translating research across descriptive psychopathology models can be difficult or impossible, impeding scientific progress. As this chapter reviews, at least four major descriptive psychopathology approaches exist – clinical theory, descriptive psychiatry, quantitative models, and biological models – each of which has unique goals, units of observation, theoretical concepts, and research traditions. Through reviewing these dominant approaches, it is illustrated how diverging language, concepts, and methods can impede communication between scientists and practitioners working within different descriptive approaches. Beyond this, specific emerging descriptive psychopathology models (i.e., HiTOP, RDoC, and transdiagnostic processes) are reviewed, which have primarily developed as a response to descriptive psychiatry’s limitations (e.g., DSM) and may advance clinical psychology. Despite the promise of these emerging descriptive models, each is still primarily rooted in one traditional descriptive approach and retains that approach’s limitations. Thus, the chapter concludes by discussing the need to integrate descriptive psychopathology approaches and the challenges associated with this task.
In their understandable desire to avoid the rigidity of some classification schemes, Romeijn and van Loo describe an empirically driven system for classification that emphasizes black-box prediction over questions of reduction or realism. I note that belief in diagnostic entities seems to persist even in a theoretical domain that is a-reductionist, and wonder why. The problem, I note, is very similar to the one faced by MacCorqodale and Meehl more than fifty years ago, when they were trying to extract clinical psychology from the tight strictures of operationalism.MacCorquodale and Meehl’s “hypothetical constructs” are a-realist in the same sense that Romeijn and van Loo’s prediction models are a-reductionist.
The concept of electoral competition plays a central role in many subfields of political science, but no consensus exists on how to measure it. One key challenge is how to conceptualize and measure electoral competitiveness at the district level across alternative electoral systems. Recent efforts to meet this challenge have introduced general measures of competitiveness which rest on explicit calculations about how votes translate into seats, but also implicit assumptions about how effort maps into votes (and how costly effort is). We investigate how assumptions about the effort-to-votes mapping affect the units in which competitiveness is best measured, arguing in favor of vote-share-denominated measures and against vote-share-per-seat measures. Whether elections under multimember proportional representation systems are judged more or less competitive than single-member plurality or runoff elections depends directly on the units in which competitiveness is assessed (and hence on assumptions about how effort maps into votes).
Nutrient profiling (NP) is a method for evaluating the healthfulness of foods. Although many NP models exist, most have not been validated. This study aimed to examine the content and construct/convergent validity of five models from different regions: Australia/New Zealand (FSANZ), France (Nutri-Score), Canada (HCST), Europe (EURO) and Americas (PAHO). Using data from the 2013 UofT Food Label Information Program (n15342 foods/beverages), construct/convergent validity was assessed by comparing the classifications of foods determined by each model to a previously validated model, which served as the reference (Ofcom). The parameters assessed included associations (Cochran–Armitage trend test), agreement (κ statistic) and discordant classifications (McNemar’s test). Analyses were conducted across all foods and by food category. On the basis of the nutrients/components considered by each model, all models exhibited moderate content validity. Although positive associations were observed between each model and Ofcom (all Ptrend<0·001), agreement with Ofcom was ‘near perfect’ for FSANZ (κ=0·89) and Nutri-Score (κ=0·83), ‘moderate’ for EURO (κ=0·54) and ‘fair’ for PAHO (κ=0·28) and HCST (κ=0·26). There were discordant classifications with Ofcom for 5·3 % (FSANZ), 8·3 % (Nutri-Score), 22·0 % (EURO), 33·4 % (PAHO) and 37·0 % (HCST) of foods (all P<0·001). Construct/convergent validity was confirmed between FSANZ and Nutri-Score v. Ofcom, and to a lesser extent between EURO v. Ofcom. Numerous incongruencies with Ofcom were identified for HCST and PAHO, which highlights the importance of examining classifications across food categories, the level at which differences between models become apparent. These results may be informative for regulators seeking to adapt and validate existing models for use in country-specific applications.