Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations

Michelle Yang; Quinta Seon; Liliana Gomez Cardona; Maharshee Karia; Gajanan Velupillai; Valérie Noel; Outi Linnaranta

doi:10.1017/gmh.2023.52

Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations

Published online by Cambridge University Press: 14 September 2023

Michelle Yang ,

Quinta Seon ,

Liliana Gomez Cardona ,

Valérie Noel and

Michelle Yang: Affiliation:
École interdisciplinaire des sciences de la santé/Interdisciplinary School of Health Sciences, Université d’Ottawa/University of Ottawa, Ottawa, ON, Canada
Quinta Seon: Affiliation:
Department of Psychiatry, Douglas Mental Health University Institute, Montreal, QC, Canada Department of Psychiatry, McGill University, Montreal, QC, Canada
Liliana Gomez Cardona: Affiliation:
Department of Psychiatry, Douglas Mental Health University Institute, Montreal, QC, Canada Department of Psychiatry, McGill University, Montreal, QC, Canada
Maharshee Karia: Affiliation:
Department of Psychiatry, Douglas Mental Health University Institute, Montreal, QC, Canada
Gajanan Velupillai: Affiliation:
Department of Psychiatry, McGill University, Montreal, QC, Canada
Valérie Noel: Affiliation:
Department of Psychiatry, Douglas Mental Health University Institute, Montreal, QC, Canada ACCESS Open Minds, Centre de recherche Douglas/Perry 3, Montreal, QC, Canada
Outi Linnaranta*: Affiliation:
Department of Psychiatry, Douglas Mental Health University Institute, Montreal, QC, Canada Department of Psychiatry, McGill University, Montreal, QC, Canada Department of Equality, Finnish Institute for Health and Welfare, Helsinki, Finland
*: Corresponding author: Outi Linnaranta; Email: [email protected]

Article contents

Abstract
Impact statement
Introduction
Methods
Results
Discussion
Limitations of the current study
Risk of bias
Conclusion
Open peer review
Author contribution
Financial support
Competing interest
References

Rights & Permissions

Abstract

Background

Implementing culturally sensitive psychometric measures of depression may be an effective strategy to improve acceptance, response rate, and reliability of psychological assessment among Indigenous populations. However, the psychometric properties of depression scales after cultural adaptation remain unclear.

Methods

We screened the Ovid Medline, PubMed, Embase, Global Health, PsycInfo, and CINAHL databases through three levels of search terms: Depression, Psychometrics, and Indigenous, following the PRISMA guidelines. We assessed metrics for reliability (including Cronbach’s alpha), validity (including fit indices), and clinical utility (including predictive value).

Results

Across 31 studies included the review, 13 different depression scales were adapted through language or content modification. Sample populations included Indigenous from the Americas, Asia, Africa, and Oceania. Most cultural adaptations had strong psychometric properties; however, few and inconsistent properties were reported. Where available, alphas, inter-rater and test–retest reliability, construct validity, and incremental validity often indicated increased cultural sensitivity of adapted scales. There were mixed results for clinical utility, criterion validity, cross-cultural validity, sensitivity, specificity, area under the receiver operating characteristic curve, predictive value, and likelihood ratio.

Conclusions

Modifications to increase cultural relevance have the potential to improve fit and acceptance of a scale by the Indigenous population, however, these changes may decrease specificity and negative predictive value. There is an urgent need for suitable tools that are useful and reliable for identifying Indigenous individuals for clinical treatment of depression. This awaits future work for optimal specificity and validated cut-off points that take into account the high prevalence of depression in these populations.

Keywords

psychometrics Indigenous validity reliability clinical utility

Type: Overview Review
Information: Cambridge Prisms: Global Mental Health , Volume 10 , 2023 , e60

DOI: https://doi.org/10.1017/gmh.2023.52 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0), which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
Copyright: © The Author(s), 2023. Published by Cambridge University Press

Impact statement

The present study suggests that modifying depression scales to fit the Indigenous context through changes to language or question structure is a culturally sensitive strategy that increases acceptance of psychological evaluation and treatment in communities. However, increasing acceptability must be balanced with maintaining clinical utility of instruments. The high prevalence of depression in these populations must be taken into account when developing culturally sensitive but specific tools.

Introduction

Cultural safety is a combination of cultural awareness (acknowledging the differences between cultures), cultural sensitivity (respecting other cultures), and cultural competency (effectively working with diverse populations through appropriate behaviors, attitudes, measures, and policies; Marsella et al., Reference Marsella, Sartorius, Jablensky, Kleinman and Good1985; Simon and Catherine, Reference Simon and Catherine2009). Research has shown that culturally competent care can improve communication between minority groups and health care professionals, including recognition of mental illness and assessment of its severity among minority groups (Schouten and Meeuwesen, Reference Schouten and Meeuwesen2006). These considerations are especially important in mental health care for Indigenous peoples, as Indigenous peoples share cultural backgrounds and ideas about health and concern that are unique from other ethnic groups (Mayberry et al., Reference Mayberry, Mili and Ofili2000). As a response to these unique perspectives, cultural adaptation of psychometric tools has become a commonly used method to increase cultural safety of psychometric evaluation and to reduce communication problems (Gomez Cardona et al., Reference Gomez Cardona, Yang, Seon, Karia, Velupillai, Noel and LinnarantaSubmitted). As a consequence, it is hoped to lead to reduced risk of harm, and to improved access of Indigenous peoples to psychiatric treatment for depression. Screening for symptoms with appropriate psychometric scales in community services is an approach to improve the quality of treatment for some Indigenous populations (Esler et al., Reference Esler, Johnston and Thomas2007).

Measurement-based psychiatric care uses standardized measures to guide treatment and subsequently evaluate treatment outcomes (Aboraya et al., Reference Aboraya, Nasrallah, Elswick, Ahmed, Estephan, Aboraya and Dohar2018). Validated psychometric measures can be used in first point of contact settings (i.e., primary care) to screen for psychiatric disorders, assess the need for treatment, monitor symptom severity, and track treatment outcomes (Porcerelli and Jones, Reference Porcerelli, Jones and Maruish2017). Indigenous communities often lack specialized resources, including staff with specialized psychiatric skills (Boksa et al., Reference Boksa, Joober and Kirmayer2015). Many studies have reported that there is a dearth of trained mental health workers of local Indigenous origin; moreover, a high turnover of non-Indigenous health workers leads to a lack of continuity of services and a lack of connection to specialized services or excessively long wait lists for Indigenous with severe mental illness (Boksa et al., Reference Boksa, Joober and Kirmayer2015). This means limited access to an interview-based psychiatric diagnosis and specialist follow-up for relapses after treatment (Boksa et al., Reference Boksa, Joober and Kirmayer2015). Moreover, trauma-informed care is rare, yet would be needed for safe and valid psychiatric assessment and interventions for Indigenous peoples. This is important as it is known that Indigenous people’s health promotion and health seeking behaviors are largely influenced by a colonialistic past which has caused intergenerational social inequities; a lack of trust and confidence in many governments still prevails among Indigenous populations, making them less likely to be screened and causing an overall resistance to engaging with the healthcare system (Leung, Reference Leung2016). Because compromised quality of mental health care is evident in many Indigenous communities, there is a need for effective symptom screening and monitoring with stable psychometric measures (Chan et al., Reference Chan, Reid, Skeffington and Marriott2021).

In addition to evaluating the need for treatment at the individual level, having culturally safe and accurate measurement of a population’s mental health is essential for appropriate resource allocation and service planning at population and community levels (Chan et al., Reference Chan, Reid, Skeffington and Marriott2021). The use of psychometric screens at a community level can raise awareness of mental health needs and crises among stakeholders in Indigenous health and provide tools for evaluating the efficacy of interventions (Chapla et al., Reference Chapla, Prabhakaran, Ganjiwale, Nimbalkar and Kharod2019). To make reliable conclusions on the efficacy of screens at community and population levels, it is important to ensure that psychometric tools are culturally safe and trauma informed, but also clinically useful, reliable when used with a specific population, and have a high quality as compared to traditional gold standards (Chan et al., Reference Chan, Reid, Skeffington and Marriott2021).

Standard qualities are requested from psychometric screens and outcome measures, and are also essential for culturally adapted depression scales. First, reliability represents a test’s consistency across test questions (Andrade, Reference Andrade2018). Second, validity represents a test’s accuracy such that test items are reported to be meaningful and relevant to the population they are used with (Andrade, Reference Andrade2018). Finally, the clinical utility of a scale indicates its utility for clinicians to diagnose and determine content of treatment (Labrique and Pan, Reference Labrique and Pan2010). In particular, the sensitivity of a screen indicates its capacity to correctly identify people who are most likely to benefit from a clinical diagnostic interview (Parikh et al., Reference Parikh, Mathai, Parikh, Chandra Sekhar and Thomas2008). Accordingly, an optimal culturally adapted psychometric scale should not exclude from clinical interviews those who are depressed (sensitivity), but should also guide an efficient use of clinician resources for an interview with individuals who require treatment (specificity).

The prevalence of depression and anxiety, and incidence of suicide among Indigenous peoples, is commonly high in comparison to Western cohorts (Shen et al., Reference Shen, Radford, Daylight, Cumming, Broe and Draper2018) It is known that a heightened presence of environmental stress and distress disproportionately raises the sensitivity of a diagnostic tool even where psychiatric care is not indicated or appropriate (Simon, Reference Simon2015). As such, in Indigenous communities, the proportion of positive cases screened by a highly sensitive scale may misleadingly indicate a need to provide a clinical interview for the entire population (Parikh et al., Reference Parikh, Mathai, Parikh, Chandra Sekhar and Thomas2008). Therefore, the optimal balance of different psychometric properties must be thoroughly tested in Indigenous communities before they are used to guide treatment.

Recently, more studies have explored different methods of culturally adapting and developing measures that reflect mental health conditions (Haswell et al., Reference Haswell, Kavanagh, Tsey, Reilly, Cadet-James, Laliberte, Wilson and Doran2010). We recently reviewed the methods of cultural adaptation of measures for depression, identifying the modifications and adaptations made and evaluating their acceptability by target Indigenous populations (Gomez Cardona et al., Reference Gomez Cardona, Yang, Seon, Karia, Velupillai, Noel and LinnarantaSubmitted). Here, we continue the work and investigate the psychometric properties of the previously identified culturally adapted psychometric measures. In this review, we assess the reported quality of the psychometric characteristics of adapted depression scales and their utility for psychological evaluation among Indigenous groups.

Methods

The methods for the systematic search, including the search strategy, were reported previously (Gomez Cardona et al., Reference Gomez Cardona, Yang, Seon, Karia, Velupillai, Noel and LinnarantaSubmitted). This study followed the outline set forth by the Preferred Reporting Items for Systematic Reviews and Meta-analyses (PRISMA; Figure 1). We searched the Ovid Medline, PubMed, PsycInfo, Embase, CINAHL, and Global Health databases for articles using three levels of search terms related to: a) Depression, b) Psychometrics, and c) Indigenous (Supplementary Material 1). After an initial search to capture articles from the inception of the databases to April 2021, we extended the search to the end of August 2022. Any reports with information on psychometric properties reported after extraction of these original studies were added to the current analysis as gray literature.

Figure 1. PRISMA flow diagram.

We extracted the information on utility of depression scales following cultural adaptation. This information included the “gold-standards” they were measured against, and the optimal cut-off point(s) determined after cultural adaptation (Table 1). We also extracted information on psychometric properties of adapted scales (Table 2), which were assessed through a quality criteria checklist (Supplementary Material 2). We extracted data on reliability (internal consistency, test–retest reliability, and inter-rater reliability), criterion validity (concurrent and predictive validity), construct validity (convergent and discriminant validity), cross-cultural validity (measurement invariance), incremental validity, and clinical utility (sensitivity, specificity, area under the ROC curve [AUC], positive predictive value [PPV], negative predictive value [NPV], and likelihood ratio [LR]). Detailed results on trends found across unique adaptation processes are presented in Supplementary Material 3.

Table 1. Characteristics of culturally adapted scales

Abbreviations: AUDIT, Alcohol Use and Disorders Identification Test; BDI, Beck Depression Inventory; CES-D, Center for Epidemiologic Studies Depression Scale; CIDI, Composite International Diagnostic Interview; CQ, Coping Questionnaire; DSM, Diagnostic and Statistical Manual of Mental Disorders; DSQ, Dar-es-Salaam Symptom Questionnaire; EPDS, Edinburgh Postnatal Depression Scale; FAI, Functioning Assessment Instrument; GDS, Geriatric Depression Scale; HIV, human immunodeficiency virus infection; HSCL, Hopkins Symptom Checklist Depression Scale; HTQ, Harvard Trauma Questionnaire; ICD-10, International Classification of Diseases-10; IDSS-G, International Depression Symptom Scale-General; K-5/6/10, Kessler Psychological Distress Scale 5-/6-/10 item; KICA-dep, Kimberley Indigenous Cognitive Assessment of Depression; KMMS, Kimberley Mum’s Mood Scale; MDD, major depressive disorder; MHAP-I, Mental Health Assessment Project Instrument; MINI(-KID), Mini International Neuropsychiatric Interview (-Children and Adolescents); NOK, Ndetei–Othieno–Kathuku Scale; PAS, psychiatric assessment schedule; PHQ, Patient Health Questionnaire; SCAN, Semi-Structured Schedules for Clinical Assessment in Neuropsychiatry Interview; SCID, Structured Clinical Interview for DSM Disorders; SRQ, Self-Reporting Questionnaire; SRT, Symptom Rating Test; WHO-CIDI v3, World Health Organization Composite International Diagnostic Interview.

Scales: light blue: CES-D; dark blue: EPDS; light purple: PHQ; dark purple: NOK; light gray: FAI; dark gray: SRQ; light orange: KICA; dark orange: Kessler; light pink: HSCL; dark pink: KMMS; light turquoise: IDSS-G; dark turquoise: GDS; brown: DSQ.

Table 2. Psychometric properties of adapted scales

Abbreviations: AUC, area under the receiving operating curve; EVA, eigenvalue(s); ICC, intraclass correlation coefficient; FL, factor loading(s); LR, likelihood ratio; NPV, negative predictive value; PPV, positive predictive value.

Quality of values (Supplementary Material 2): red: poor psychometric qualities; yellow: moderate psychometric qualities; green: strong psychometric qualities.

Modifications to scale: +/−, added/deleted; i, suicidal ideation items; h, hope items; l, loneliness items; s, somatic difficulty items; a, anger items; m, simplified language; d, local idioms of distress; p, changed scale administration protocol; t, translated; f, rephrased; *, other.

Assessment of quality

We followed the guidelines of the ROBIS tool to assess the quality and risk of bias of this review (Whiting et al., Reference Whiting, Savović, Higgins, Caldwell, Reeves, Shea, Davies, Kleijnen and Churchill2016; Supplementary Material 4). Here, we report potential biases across several domains: a) study eligibility criteria, b) identification and selection of studies, c) data collection and study appraisal, and d) synthesis and findings.

Results

Description of the adapted scales

Originally, we had identified 37 studies that met criteria in the systematic search (Gomez Cardona et al., Reference Gomez Cardona, Brown, McComber, Outerbridge, Parent-Racine, Phillips, Boyer, Martin, Splicer, Thompson, Yang, Velupillai, Laliberté, Haswell and Linnaranta2021). Thirty-one (83.8%) of these studies reported results on the validation of their scales’ psychometric propert(ies). Cohort sizes ranged from n = 97 to n = 4,767. Target Indigenous peoples were from native to Canada or the United States (5/31), Latin America (3/31), Asia (8/31), Africa (8/31), and Australia or New Zealand (7/31); many populations lived in rural settings. Each of the 31 studies produced a unique culturally adapted scale based on their individual methods. These 31 scales were, or are variations of, the following scales, listed in decreasing number of adaptations per scale: Center for Epidemiologic Studies Depression Scale (CES-D), n = 8; Patient Health Questionnaire (PHQ-9), n = 7; Edinburgh Postnatal Depression Scale (EPDS), n = 4; Kessler Psychological Distress Scale, n = 4; Hopkins Symptom Checklist Depression Scale (HSCL), n = 3; Geriatric Depression Scale (GDS), n = 2; Kimberley Indigenous Cognitive Assessment of Depression (KICA-dep), n = 1; Ndetei–Othieno–Kathuku scale (NOK), n = 1; International Depression Symptom Scale-General (IDSS-G), n = 1; Dar-es-Salaam Symptom Questionnaire (DSQ), n = 1; Kimberley Mum’s Mood Scale (KMMS), n = 1; Self-Reporting Questionnaire (SRQ), n = 1; Functioning Assessment Instrument (FAI), n = 1.

Reliability

Cronbach’s alpha was the most commonly analyzed psychometric property, reported by 77% (24/31) of adapted depression measurement scales (Ganguli et al., Reference Ganguli, Dube, Johnston, Pandav, Chandra and Dodge1999; Tiburcio Sainz and Natera Rey, Reference Tiburcio Sainz and Natera Rey2007; Bass et al., Reference Bass, Ryder, Lammers, Mukaba and Bolton2008; Campbell et al., Reference Campbell, Hayes and Buckby2008; Esler et al., Reference Esler, Johnston, Thomas and Davis2008; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Armenta et al., Reference Armenta, Sittner Hartshorn, Whitbeck, Crawford and Hoyt2014; Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014; McNamara et al., Reference McNamara, Banks, Gubhaju, Williamson, Joshy, Raphael and Eades2014; Schneider et al., Reference Schneider, Baron, Davies, Bass and Lund2015; Bougie et al., Reference Bougie, Arim, Kohen and Findlay2016; Baron et al., Reference Baron, Davies and Lund2017; Denckla et al., Reference Denckla, Ndetei, Mutiso, Musyimi, Musau, Nandoya, Anderson, Milanovic, Henderson and McKenzie2017; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017; Marley et al., Reference Marley, Kotz, Engelke, Williams, Stephen, Coutinho and Trust2017; Schantz et al., Reference Schantz, Reighard, Aikens, Aruquipa, Pinto, Valverde and Piette2017; Gallis et al., Reference Gallis, Maselko, O’Donnell, Song, Saqib, Turner and Sikander2018; Harry and Crea, Reference Harry and Crea2018; Kilburn et al., Reference Kilburn, Prencipe, Hjelm, Peterman, Handa and Palermo2018; Ashaba et al., Reference Ashaba, Cooper-Vince, Vořechovská, Maling, Rukundo, Akena and Tsai2019; Chapla et al., Reference Chapla, Prabhakaran, Ganjiwale, Nimbalkar and Kharod2019; Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019). High alpha values were reported by all but four of the studies; moderate alphas corresponded with the CES-D, HSCL, and FAI (Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014; Schneider et al., Reference Schneider, Baron, Davies, Bass and Lund2015; Kilburn et al., Reference Kilburn, Prencipe, Hjelm, Peterman, Handa and Palermo2018; Chapla et al., Reference Chapla, Prabhakaran, Ganjiwale, Nimbalkar and Kharod2019). These findings indicate good consistency in the types of questions used in the adapted measure as a whole.

Interclass correlation (ICC), item-item correlation, and item-total correlation are an alternative to alpha as a measure for reliability following adaptation. Six studies showed moderate properties of these metrics within the KICA-dep, K-10, EPDSb, NOK, DSQ, and K-5 (Campbell et al., Reference Campbell, Hayes and Buckby2008; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Almeida et al., Reference Almeida, Flicker, Fenner, Smith, Hyde, Atkinson, Skeaf, Malay and LoGiudice2014; McNamara et al., Reference McNamara, Banks, Gubhaju, Williamson, Joshy, Raphael and Eades2014; Bougie et al., Reference Bougie, Arim, Kohen and Findlay2016; Denckla et al., Reference Denckla, Ndetei, Mutiso, Musyimi, Musau, Nandoya, Anderson, Milanovic, Henderson and McKenzie2017). Four studies demonstrated high properties of inter-rater and test–retest reliability (2 days to 1 week; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014, Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017).

Cross-cultural validity

Measurement invariance testing was conducted in 4/31 studies (12.9%; Chapleski et al., Reference Chapleski, Lamphere, Kaczynski, Lichtenberg and Dwyer1997; McNamara et al., Reference McNamara, Banks, Gubhaju, Williamson, Joshy, Raphael and Eades2014; Harry and Crea, Reference Harry and Crea2018; Kilburn et al., Reference Kilburn, Prencipe, Hjelm, Peterman, Handa and Palermo2018). These tests used multigroup confirmatory factor analysis (CFA) to provide evidence for cross-cultural validity. These studies showed that some adapted scales were invariant across different groups of peoples, such as the same Indigenous Nation living in distinct residential locations (i.e., urban, rural off-reservation, and reservation; Chapleski et al., Reference Chapleski, Lamphere, Kaczynski, Lichtenberg and Dwyer1997).

Criterion (concurrent and predictive) validity

Concurrent and predictive (criterion) validity was determined in 11/31 (35.5%) studies. Evidence for concurrent and predictive validity were reported through correlation analysis between scale ratings and ratings from the Diagnostic Interview Schedule or a well-established depression measure in the respective location (Ganguli et al., Reference Ganguli, Dube, Johnston, Pandav, Chandra and Dodge1999; Campbell et al., Reference Campbell, Hayes and Buckby2008; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Fernandes et al., Reference Fernandes, Srinivasan, Stein, Menezes, Sumithra and Ramchandani2011; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Armenta et al., Reference Armenta, Sittner Hartshorn, Whitbeck, Crawford and Hoyt2014; Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014; Denckla et al., Reference Denckla, Ndetei, Mutiso, Musyimi, Musau, Nandoya, Anderson, Milanovic, Henderson and McKenzie2017; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017; Schantz et al., Reference Schantz, Reighard, Aikens, Aruquipa, Pinto, Valverde and Piette2017). About 7/11 studies (63.4%) measuring criterion validity demonstrated that the scale adaptations resulted in high levels of concordance with a psychiatric diagnosis of major depressive disorder (MDD) or a gold-standard measure of distress (Campbell et al., Reference Campbell, Hayes and Buckby2008; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Fernandes et al., Reference Fernandes, Srinivasan, Stein, Menezes, Sumithra and Ramchandani2011; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Armenta et al., Reference Armenta, Sittner Hartshorn, Whitbeck, Crawford and Hoyt2014; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017; Schantz et al., Reference Schantz, Reighard, Aikens, Aruquipa, Pinto, Valverde and Piette2017). However, 4/11 (36.4%) reports showed poor evidence for criterion validity (Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014; Denckla et al., Reference Denckla, Ndetei, Mutiso, Musyimi, Musau, Nandoya, Anderson, Milanovic, Henderson and McKenzie2017).

Construct (convergent and discriminant) validity

Convergent and discriminant (construct) validity was measured through CFA, exploratory factor analysis (EFA), correlation analysis, or multivariate regression in 18/31 (58%) studies (Chapleski et al., Reference Chapleski, Lamphere, Kaczynski, Lichtenberg and Dwyer1997; Ganguli et al., Reference Ganguli, Dube, Johnston, Pandav, Chandra and Dodge1999; Tiburcio Sainz and Natera Rey, Reference Tiburcio Sainz and Natera Rey2007; Bass et al., Reference Bass, Ryder, Lammers, Mukaba and Bolton2008; Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Armenta et al., Reference Armenta, Sittner Hartshorn, Whitbeck, Crawford and Hoyt2014; McNamara et al., Reference McNamara, Banks, Gubhaju, Williamson, Joshy, Raphael and Eades2014; Schneider et al., Reference Schneider, Baron, Davies, Bass and Lund2015; Bougie et al., Reference Bougie, Arim, Kohen and Findlay2016; Baron et al., Reference Baron, Davies and Lund2017; Denckla et al., Reference Denckla, Ndetei, Mutiso, Musyimi, Musau, Nandoya, Anderson, Milanovic, Henderson and McKenzie2017; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017; Schantz et al., Reference Schantz, Reighard, Aikens, Aruquipa, Pinto, Valverde and Piette2017; Harry and Crea, Reference Harry and Crea2018; Kilburn et al., Reference Kilburn, Prencipe, Hjelm, Peterman, Handa and Palermo2018; Ashaba et al., Reference Ashaba, Cooper-Vince, Vořechovská, Maling, Rukundo, Akena and Tsai2019; Chapla et al., Reference Chapla, Prabhakaran, Ganjiwale, Nimbalkar and Kharod2019). The majority of these studies showed evidence for a high level of construct validity. Most of the adaptation processes improved the scales’ ability to capture globally meaningful constructs of depression.

Incremental validity

Only 2/31 (6.5%) studies examined incremental validity through regression modeling (Mitchell and Beals, Reference Mitchell and Beals2011; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017). Strong incremental validity suggested that these adapted scales were able to predict the severity of outcomes among the Indigenous population better than existing measures. In the two studies, the predicted outcomes included lifetime mood disorders, physical diagnosis, alcohol use, and impaired functioning (Mitchell and Beals, Reference Mitchell and Beals2011; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017).

Clinical utility

The cut-off score of an adapted scale also dictates the strength of the PPV, NPV, and AUC. These values must be strong in order to adapt scales to capture the true prevalence of depression among the Indigenous group. Fourteen (14/31, 45.1%) studies reported discrimination properties following cultural adaptation of the scale, determined by AUC (Tiburcio Sainz and Natera Rey, Reference Tiburcio Sainz and Natera Rey2007; Bass et al., Reference Bass, Ryder, Lammers, Mukaba and Bolton2008; Fernandes et al., Reference Fernandes, Srinivasan, Stein, Menezes, Sumithra and Ramchandani2011; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Almeida et al., Reference Almeida, Flicker, Fenner, Smith, Hyde, Atkinson, Skeaf, Malay and LoGiudice2014; Sarkar et al., Reference Sarkar, Kattimani, Roy, Premarajan and Sarkar2015; Baron et al., Reference Baron, Davies and Lund2017; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017; Marley et al., Reference Marley, Kotz, Engelke, Williams, Stephen, Coutinho and Trust2017; Gallis et al., Reference Gallis, Maselko, O’Donnell, Song, Saqib, Turner and Sikander2018; Ashaba et al., Reference Ashaba, Cooper-Vince, Vořechovská, Maling, Rukundo, Akena and Tsai2019; Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019; Caneo et al., Reference Caneo, Toro and Ferreccio2020). Of these, 4/14 (28.6%) scales had a strong discrimination and were able to discriminate between cases and non-cases of depression among the specific community (Fernandes et al., Reference Fernandes, Srinivasan, Stein, Menezes, Sumithra and Ramchandani2011; Baron et al., Reference Baron, Davies and Lund2017; Gallis et al., Reference Gallis, Maselko, O’Donnell, Song, Saqib, Turner and Sikander2018; Caneo et al., Reference Caneo, Toro and Ferreccio2020). 9/31, 29.0%). Nine studies reported the PPV and NPV of the adapted instrument (Husain et al., Reference Husain, Gater, Tomenson and Creed2006; Esler et al., Reference Esler, Johnston, Thomas and Davis2008; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Almeida et al., Reference Almeida, Flicker, Fenner, Smith, Hyde, Atkinson, Skeaf, Malay and LoGiudice2014; Baron et al., Reference Baron, Davies and Lund2017; Marley et al., Reference Marley, Kotz, Engelke, Williams, Stephen, Coutinho and Trust2017; Gallis et al., Reference Gallis, Maselko, O’Donnell, Song, Saqib, Turner and Sikander2018; Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019). Of these, only one scale had a high PPV (Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012), but 8/9 (88.9%) had a high NPV (Husain et al., Reference Husain, Gater, Tomenson and Creed2006; Esler et al., Reference Esler, Johnston, Thomas and Davis2008; Ekeroma et al., Reference Ekeroma, Ikenasio-Thorpe, Weeks, Kokaua, Puniani, Stone and Foliaki2012; Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Almeida et al., Reference Almeida, Flicker, Fenner, Smith, Hyde, Atkinson, Skeaf, Malay and LoGiudice2014; Marley et al., Reference Marley, Kotz, Engelke, Williams, Stephen, Coutinho and Trust2017; Gallis et al., Reference Gallis, Maselko, O’Donnell, Song, Saqib, Turner and Sikander2018; Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019).

Three studies reported the LR, and found moderate levels of LR following adaptation. (Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Ashaba et al., Reference Ashaba, Cooper-Vince, Vořechovská, Maling, Rukundo, Akena and Tsai2019; Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019).

Scale performance

Differences in adaptation methods across studies yielded differences in scale performance after adaptation (Table 2). There is evidence that the adaptation methods impacted the psychometric properties of the scales, evidenced by several patterns across scales. The adapted CES-D (n = 8) was a particularly strong scale; there was evidence for its reliability, validity, and clinical utility across adaptations. The PHQ (n = 7) performed poorly on specificity and PPV, however, it had excellent internal consistency, sensitivity, NPV, and construct validity. The Kessler scales (n = 4) performed well across all validity tests and had high internal consistency. The EPDS (n = 4) had good criterion validity, PPV, and NPV, but only moderate sensitivity, specificity, and discrimination. The HSCL (n = 3) had good internal consistency and construct validity, but poor criterion validity and clinical utility metrics. The GDS (n = 2) did not have high sensitivity or specificity. The NOK (n = 1) had moderate internal consistency and high construct validity but poor criterion validity. The IDSS-G (n = 1) had excellent reliability metrics and incremental validity, but did not have high specificity, and discrimination. The DSQ (n = 1) had high reliability but poor validity. The KMMS (n = 1) had high internal consistency, sensitivity, specificity, and NPV, but poor PPV. The FAI (n = 1) had high construct validity and moderate internal consistency. The adapted KICA-DEP (n = 1) showed acceptable internal consistency, specificity, and NPV.

Discussion

In this review, we synthesized the global evidence for the psychometric properties of depression scales culturally adapted for Indigenous peoples. Many processes taken to develop and adapt such instruments were successful in improving the measures’ reliability (internal consistency, test–retest reliability, and inter-rater reliability), convergent validity (construct and discriminant), and incremental validity. These processes included adding or deleting items, translation, and incorporating local idioms of distress. However, cultural adaptation methods had less success in improving criterion validity (concurrent and predictive) and cross-cultural validity (measurement invariance). Additionally, the adapted screening instruments were typically highly sensitive, which means they were useful for identifying individuals who might be depressed. Conversely, the specificity of many instruments was low. Despite most scales being acceptable among the population, for clinicians, it is possible that low specificity tools show no added value for screening within populations with a high prevalence of depression.

Some depression scales might be more globally suitable for use among Indigenous peoples than others. Among the studies, researchers chose 13 different original depression scales to adapt. Information on selection criteria and processes remain insufficient to conclude whether a higher number of reports indicates positive or negative characteristics of a scale. In this study, adapted CES-D outperformed other scales after adaptation, as it showed the most uniformly high psychometric properties.

A gap in reporting characteristics

Testing for quality metrics was sporadic; only 2/31 studies tested for the majority of quality metrics (Gelaye et al., Reference Gelaye, Williams, Lemma, Deyessa, Bahretibeb, Shibre, Wondimagegn, Lemenhe, Fann, Vander Stoep and Andrew Zhou2013; Haroz et al., Reference Haroz, Bass, Lee, Oo, Lin, Kohrt, Michalopolous, Nguyen and Bolton2017). Out of 15 psychometric characteristics to assess the effects of cultural adaptation, reliability and construct validity were two of the most commonly tested psychometric properties. In contrast, clinical utility metrics, including sensitivity, specificity, PPV, NPV, and LR, were less commonly measured. These results indicate a continued gap in the knowledge around the performance of adapted scales with Indigenous populations, namely, how different adaptation processes produce benefits for psychological testing in these communities. As such, we cannot draw strong conclusions about the clinical use of culturally adapted scales, nor about individual methods of cultural adaptation that maintain reliability and clinical utility.

Increasing reliability and validity through cultural adaptations

Most reliability metrics of adapted scales were excellent, particularly in alpha values. This finding suggests that following adaptations, scale items were relatively consistent and that instructions for scoring scales were unambiguous. Yet, a limitation to using alpha to determine internal consistency is that higher alpha’s do not necessarily indicate higher quality of a new scale (Sijtsma, Reference Sijtsma2009). Exceptionally high alpha values (>0.9) could have resulted when the adaptation process adds length or redundant items to the original scale, or when the modifications constrict the crucial constructs which are measured by the scale (Panayides and Walker, Reference Panayides and Walker2013). Some studies assumed higher reliability of the developed scale following a high alpha value. In some cases, researchers even relied on alpha values to remove items that did not correlate well; however, this is not the developed purpose of alpha values (Cartagena-Ramos et al., Reference Cartagena-Ramos, Fuentealba-Torres, Rebustini, Leite, Alvarenga, Arcêncio, Dantas and Nascimento2018).

Most studies did not cover multiple types of validity. Based on few positive results, some adaptation processes improved understanding and acceptance among the Indigenous population. Many scales had high construct validity, showing that factors of the scales represented true constructs of depression known to the specific Indigenous group, such as affective or somatic symptoms. A few studies also found acceptable criterion validity, showing that the scales accurately responded to established criterion of depression used by gold-standard instruments with populations where the original scale is used. Measuring incremental validity of scales was important to understand if adapting scales, such as through incorporating local idioms of distress, predicted outcomes above and beyond previously established Western measures. A few scales with high incremental validity were useful to predict functional impairment above the scores on non-adapted measures; they showed which scores most accurately indicated mental health concern in that population.

Measurement invariance

A scale’s measurement invariance, or cross-cultural validity, indicates how well a new scale minimizes the inter-rater differences, such as how different populations endorse scale items (Bader et al., Reference Bader, Jobst, Zettler, Hilbig and Moshagen2021). Some scales proved to be measurement invariant following adaptation processes, meaning different cultural groups interpreted the constructs of depression in the scale in a conceptually similar way. Unfortunately, most studies did not compare the results of the scale’s properties between Indigenous and non-Indigenous groups, or between different Indigenous groups (Bougie et al., Reference Bougie, Arim, Kohen and Findlay2016; Kilburn et al., Reference Kilburn, Prencipe, Hjelm, Peterman, Handa and Palermo2018). In fact, without invariance testing, results might not even generalize to the same Indigenous group living on-reserves or in rural areas (Bougie et al., Reference Bougie, Arim, Kohen and Findlay2016). Therefore, it remains inconclusive whether these adaptation processes increased cultural safety without sacrificing the accuracy of the original scale. An adapted scale should ultimately balance measurement invariance and diagnostic ability for it to be useful.

Clinical utility

For clinical utility, the fact that the prevalence of depression is high in many Indigenous communities should be considered in future studies. Increasing the acceptance of a screening tool through cultural adaptation may inadvertently increase its sensitivity by lowering the threshold to be considered a positive case (Shen et al., Reference Shen, Radford, Daylight, Cumming, Broe and Draper2018). Specifically, we found that the sensitivity of adapted scales was often considerably higher than its specificity. Accordingly, PPV and NPV represent the percentage of individuals who truly do or do not have a depressive disorder (respectively), and a trend of high NPV and low PPV among the scales may be explained by the prevalence rates (Simon, Reference Simon2015). The findings suggest that processes to culturally adapt scales can make them more effective at detecting positive cases of depression rather than screening out negative cases in the population (Labrique and Pan, Reference Labrique and Pan2010).

Moreover, a pattern of low PPV suggests the added or modified scale items may not have been disorder specific after cultural adaptation. Our results on the NPV and PPV of adapted scales align with previous literature stating that predictive values are one of the most important metrics to guide treatment of clinical practitioners (Labrique and Pan, Reference Labrique and Pan2010). Predictive values indicate the diagnostic capability of the test in the real-world and are thus referred to as the scale’s real-world performance or clinical relevance to screening individuals who may benefit from a more accurate diagnostic interview and treatment guided by a diagnosis (Labrique and Pan, Reference Labrique and Pan2010). Although predictive value metrics were seldom reported by studies, studies have stated that this psychometric evaluation should not be forgotten. Predictive value scores reflect congruity with Western-based ideas about the definition of depression, how it manifests and should optimally be treated; community knowledge may enhance clinical utility and safety in presentations that do not match Western ideas (Haswell et al., Reference Haswell, Kavanagh, Tsey, Reilly, Cadet-James, Laliberte, Wilson and Doran2010).

The studies indicate that to increase the clinical relevance of the adapted scales, it is necessary to increase NPV (minimizing false negatives), and to increase PPV (minimizing false positives). The findings in the original studies suggest that to increase NPV after adaptation, it is best to avoid or exclude scale items which are likely to be endorsed easily by the entire population (i.e., not measuring a locally specific symptom of depression) as may innacurately increase the sensitivity of the scale for a true depressive case (Mitchell and Beals, Reference Mitchell and Beals2011; Sarkar et al., Reference Sarkar, Kattimani, Roy, Premarajan and Sarkar2015; Simon, Reference Simon2015). Additionally, to increase PPV and to increase specificity, it is recommended to focus questions on motivational, cognitive and affective components of depression rather than on symptoms that may arise from non-psychiatric medical conditions, as these can inflate scores and make the scale less specific for a depressive case (Ganguli et al., Reference Ganguli, Dube, Johnston, Pandav, Chandra and Dodge1999; Simon, Reference Simon2015). The studies noted that increasing specificity is particularly necessary in low-resource settings, such as the settings where most adaptations were conducted. The value of a screen with high specificity in Indigenous communities is that it allows clinicians to effectively allocate resources such that the most in need receive timely treatment.

The AUC of a few studies was high, showing that the cut-off points determined after cultural adaptation had discriminative capacity (Schwarzbold et al., Reference Schwarzbold, Diaz, Nunes, Sousa, Hohl, Guarnieri, Linhares and Walz2014). However, because few studies assessed AUC, we cannot reach a strong conclusion on the discrimination properties of most types of adapted depression scales. Similarly, the LRs of most adapted scales in the studies were moderate, but there is a lack of data to reach conclusions on their ability to screen for probable depression. A high LR is the likelihood for having a high risk for depression – as indicated by the cut-off point of the adapted scale – for a person with current MDD compared to someone who does not (Hackett et al., Reference Hackett, Teixeira‐Pinto, Farnbach, Glozier, Skinner, Askew, Gee, Cass and Brown2019).

When could a cultural adaptation be useful?

Ideally, cultural adaptations would be useful when balancing two goals: a) increase acceptability through translating Western constructs of psychological problems to the Indigenous context, and b) support treatment (Kohrt et al., Reference Kohrt, Rasmussen, Kaiser, Haroz, Maharjan, Mutamba, de Jong and Hinton2014; Chan et al., Reference Chan, Reid, Skeffington and Marriott2021). The strength of cultural adaptation is that it can improve scale items’ reliability and validity through changes to language or representation of depression constructs. At the same time, there are limitations inextricably linked to these changes, including their impact on clinical utility.

Symptom measures are a Western concept, and the Western medicine has been built to treat individuals based on symptom severity (Bredström, Reference Bredström2019). Doctors need certain criteria to evaluate the need for treatment and to define recovery. Researchers have also used scale cut-offs for intake and to define how many benefitted from a certain intervention. An existing gold standard for a diagnosis is essential for validation after cultural adaptation, and to validate diagnostic cut-offs for a screen. Without tools available to have an accurate prevalence measure or diagnostic tool, it is difficult to ascertain true clinical utility of adapted scales. Defining a gold-standard for depression is harder, and a gold standard measure is often not available, in Indigenous areas (Kaaya et al., Reference Kaaya, Lee, Mbwambo, Smith-Fawzi and Leshabari2008; Haroz et al., Reference Haroz, Bass, Lee, Murray, Robinson and Bolton2014).

Future lines of work

Action is needed to address knowledge gaps around Indigenous mental health constructs as well as to understand how interventions, policies, or programs can support unmet needs of these populations. For this to be possible, there must first be an understanding about Indigenous concepts of emotional or mental distress and wellbeing. This forms a basis for treatments as well as tools to identify individuals and communities in need of support and interventions. However, in some instances, cultural adaptation might not be the primary approach in building trust and supporting empowerment (Gomez Cardona et al., Reference Gomez Cardona, Brown, McComber, Outerbridge, Parent-Racine, Phillips, Boyer, Martin, Splicer, Thompson, Yang, Velupillai, Laliberté, Haswell and Linnaranta2021). As an alternative to cultural adaptation, methods of administration could also increase cultural safety. This includes at least considering the setting, language, use of community members at administration, and visual elements (Gomez Cardona et al., Reference Gomez Cardona, Yang, Seon, Karia, Velupillai, Noel and LinnarantaSubmitted).

It is not rare for Indigenous communities to question the concept of a symptom focused approach that uses distinct cut-offs (Gomez Cardona et al., Reference Gomez Cardona, Brown, McComber, Outerbridge, Parent-Racine, Phillips, Boyer, Martin, Splicer, Thompson, Yang, Velupillai, Laliberté, Haswell and Linnaranta2021). In fact, research teams have worked to co-design culture-specific tools for a culturally based, often community-targeted interventions supporting empowerment rather than symptom reduction to meet cut-offs (Haswell et al., Reference Haswell, Kavanagh, Tsey, Reilly, Cadet-James, Laliberte, Wilson and Doran2010; Gomez Cardona et al., Reference Gomez Cardona, Brown, McComber, Outerbridge, Parent-Racine, Phillips, Boyer, Martin, Splicer, Thompson, Yang, Velupillai, Laliberté, Haswell and Linnaranta2021). Thus, if you accept a non-symptom-based approach for interventions, you will not need symptom-based measures. This should be a main consideration when evaluating the necessity to culturally adapt a depression measure for a particular Indigenous group.

In the future, studies should include a needs assessment prior to developing new screening tools. If modifications are warranted, qualitative methods may be beneficial to understand the community’s needs which can be addressed by a novel or adapted scale (Gomez Cardona et al., Reference Gomez Cardona, Yang, Seon, Karia, Velupillai, Noel and LinnarantaSubmitted). Concurrently, there must be emphasis on ensuring adaptation processes yield stable and clinically useful tools. This represents a shortcoming observed across the studies in this review; there was limited comprehensive testing of multiple psychometric domains. It is necessary to not only evaluate reliability and validity following changes to the scale, but also the utility of the new scale to accurately detecting the risk of depression.

Similar to what has been found by other researchers who adapted or developed new scales for use with Indigenous peoples, we advocate for future studies to examine scales’ sensitivity to change (Haswell et al., Reference Haswell, Kavanagh, Tsey, Reilly, Cadet-James, Laliberte, Wilson and Doran2010). Sensitivity to change reflects the competency of the scale to detect changes in mood, and as such, is an essential characteristic to evaluate treatment response. Evaluation and treatment of mental health relies on stable and measurement invariant tools for screening those in need of treatment, which should simultaneously signal the direction and magnitude of effect sizes of treatments to understand their efficacy for a particular population (Fitzpatrick et al., Reference Fitzpatrick, Haswell, Williams, Nathan, Meyer, Ritchie and Jackson Pulver2019).

Limitations of the current study

Five databases were systematically searched. Hence, it is possible that not all relevant studies were identified through systematic search strategies. To reduce the risk, gray literature found through hand-searching was incorporated in the review; this includes searching on open access repositories and through checking references of included articles. Several key terms do not have globally accepted terms, most importantly, Indigenous or cultural adaptation, which may have limited the scope of our screening. The conclusions made for validity and utility are limited by the fact that the included reports on cultural adaptation described a limited number of characteristics. This limits this study’s ability to draw strong conclusions to guide a specific selection of tools for clinical use or for research.

We only considered studies published in the English language. This ensured that data for the studies’ validation analyses were uniform, however, it is possible that some existing adaptation processes were left out of this review if they were reported in a different language. Further validation, analysis, and adaptation of new and existing measures of depression is needed and will confirm the applicability of such instruments in culturally distinct populations.

Risk of bias

The four domains of ROBIS were completed, which indicated this review was completed with a low risk of bias. This pertains to study selection, data collection and study appraisal, and data synthesis. The conclusions of the review are supported by the evidence presented and included consideration of the relevance of included studies. The methodology of the synthesis was driven by the nature of the studies and our research objectives, however, since a meta-analysis was not conducted, no statistical synthesis methods were undertaken.

Conclusion

Through a review of the literature, we found evidence that cultural adaptation may increase the validity of depression scales and their reliability to be used in mental health assessment of Indigenous populations across the world. The current review supports the use of adapting scales to fit the Indigenous context, increasing its acceptability to community members and overall consistency. At the same time, psychometric testing of adapted scales highlights a potential caveat of losing clinical utility with too high sensitivity and low specificity. Cultural adaptation of depression assessments for Indigenous populations would be clinically useful when balancing two goals: 1) increase acceptability through translating Western constructs of psychological problems to the Indigenous context, and 2) support treatment.

Open peer review

To view the open peer review materials for this article, please visit http://doi.org/10.1017/gmh.2023.52.

Supplementary material

The supplementary material for this article can be found at http://doi.org/10.1017/gmh.2023.52.

Acknowledgments

We would like to thank Ms. Andrea Quaiattini from the McGill University Library for her assistance with our comprehensive systematic search procedure. We also express our gratitude for Indigenous co-authors in other publications and some other community members in Montreal and Kahnawà:ke for introducing us to some basic concepts of their view of health and wellbeing as opposed to symptom measurement.

Author contribution

Conceptualization: L.G.C., O.L.; Formal analysis: M.Y., Q.S., L.G.C., V.N., M.K., G.V., O.L.; Funding acquisition: O.L., L.G.C.; Investigation: M.Y., Q.S., V.N., M.K., O.L.; Methodology: L.G.C., Q.S., M.Y., O.L., V.N.; Project administration: O.L., Resources: O.L.; Supervision: O.L.; Validation: O.L.; Visualization: M.Y., O.L.; Writing – original draft: M.Y., Q.S., L.G.C., O.L.; Writing – review and editing: M.Y., Q.S., L.G.C., V.N., M.K., G.V., O.L.

Financial support

O.L. was funded by grants from FSISSS (#8400958), CIHR Institute of Indigenous Peoples’ Health (#426678), FRSQ (#252872 and #265693O), and the Réseau Québécois sur le suicide, les troubles de l’humeur et les troubles associés. L.G.C. had financial support from the Réseau universitaire intégré de santé et services sociaux (RUISSS) McGill and CIHR Institute of Indigenous Peoples’ Health (#430331). Q.S. had funding from McGill University’s Healthy Brains Healthy Lives fellowship. The funders had no role in study design or in later collection and data analysis.

Competing interest

The authors report no competing interests. The authors alone are responsible for the content and writing of the paper.

References

Aboraya, A, Nasrallah, HA, Elswick, DE, Ahmed, E, Estephan, N, Aboraya, D and Dohar, S (2018) Measurement-based care in psychiatry-past, present, and future. Innovations in Clinical Neuroscience 15(11–12), 13–26.Google Scholar PubMed

Almeida, OP, Flicker, L, Fenner, S, Smith, K, Hyde, Z, Atkinson, D, Skeaf, L, Malay, R and LoGiudice, D (2014) The Kimberley assessment of depression of older Indigenous Australians: Prevalence of depressive disorders, risk factors and validation of the KICA-dep scale. PLoS One 9(4), e94983. https://doi.org/10.1371/journal.pone.0094983 CrossRef Google Scholar PubMed

Andrade, C (2018) Internal, external, and ecological validity in research design, conduct, and evaluation. Indian Journal of Psychological Medicine 40(5), 498–499. https://doi.org/10.4103/ijpsym.Ijpsym_334_18 CrossRef Google Scholar PubMed

Armenta, BE, Sittner Hartshorn, KJ, Whitbeck, LB, Crawford, DM and Hoyt, DR (2014) A longitudinal examination of the measurement properties and predictive utility of the Center for Epidemiologic Studies Depression Scale among North American Indigenous adolescents. Psychological Assessment 26(4), 1347–1355. https://doi.org/10.1037/a0037608 CrossRef Google Scholar

Ashaba, S, Cooper-Vince, C, Vořechovská, D, Maling, S, Rukundo, GZ, Akena, D and Tsai, AC (2019) Development and validation of a 20-item screening scale to detect major depressive disorder among adolescents with HIV in rural Uganda: A mixed-methods study. SSM - Population Health 7, 100332. https://doi.org/10.1016/j.ssmph.2018.100332 CrossRef Google Scholar PubMed

Bader, M, Jobst, LJ, Zettler, I, Hilbig, BE and Moshagen, M (2021) Disentangling the effects of culture and language on measurement noninvariance in cross-cultural research: The culture, comprehension, and translation bias (CCT) procedure. Psychological Assessment 33(5), 375–384. https://doi.org/10.1037/pas0000989 CrossRef Google Scholar PubMed

Baron, EC, Davies, T and Lund, C (2017) Validation of the 10-item Centre for Epidemiological Studies Depression Scale (CES-D-10) in Zulu, Xhosa and Afrikaans populations in South Africa. BMC Psychiatry 17(1), 6. https://doi.org/10.1186/s12888-016-1178-x CrossRef Google Scholar PubMed

Bass, JK, Ryder, RW, Lammers, M-C, Mukaba, TN and Bolton, PA (2008) Post-partum depression in Kinshasa, Democratic Republic of Congo: Validation of a concept using a mixed-methods cross-cultural approach. Tropical Medicine & International Health 13(12), 1534–1542. https://doi.org/10.1111/j.1365-3156.2008.02160.x CrossRef Google Scholar PubMed

Boksa, P, Joober, R and Kirmayer, LJ (2015) Mental wellness in Canada’s Aboriginal communities: Striving toward reconciliation. Journal of Psychiatry & Neuroscience 40(6), 363–365. https://doi.org/10.1503/jpn.150309 CrossRef Google Scholar PubMed

Bougie, E, Arim, RG, Kohen, DE and Findlay, LC (2016) Validation of the 10-item Kessler Psychological Distress Scale (K10) in the 2012 Aboriginal peoples survey. Health Reports 27(1), 3–10.Google Scholar PubMed

Bredström, A (2019) Culture and context in mental health diagnosing: Scrutinizing the DSM-5 revision. Journal of Medical Humanities 40(3), 347–363. https://doi.org/10.1007/s10912-017-9501-1 CrossRef Google Scholar PubMed

Campbell, A, Hayes, B and Buckby, B (2008) Aboriginal and Torres Strait Islander women’s experience when interacting with the Edinburgh Postnatal Depression Scale: A brief note. Australian Journal of Rural Health 16(3), 124–131. https://doi.org/10.1111/j.1440-1584.2007.00930.x CrossRef Google Scholar PubMed

Caneo, C, Toro, P and Ferreccio, C (2020) Validity and performance of the Patient Health Questionnaire (PHQ-2) for screening of depression in a rural Chilean cohort. Community Mental Health Journal 56(7), 1284–1291. https://doi.org/10.1007/s10597-020-00605-8 CrossRef Google Scholar

Cartagena-Ramos, D, Fuentealba-Torres, M, Rebustini, F, Leite, A, Alvarenga, WA, Arcêncio, RA, Dantas, RAS and Nascimento, LC (2018) Systematic review of the psychometric properties of instruments to measure sexual desire. BMC Medical Research Methodology 18(1), 109. https://doi.org/10.1186/s12874-018-0570-2 CrossRef Google Scholar PubMed

Chan, AW, Reid, C, Skeffington, P and Marriott, R (2021) A systematic review of EPDS cultural suitability with Indigenous mothers: A global perspective. Archives of Women’s Mental Health 24(3), 353–365. https://doi.org/10.1007/s00737-020-01084-2 CrossRef Google Scholar PubMed

Chapla, A, Prabhakaran, A, Ganjiwale, J, Nimbalkar, S and Kharod, N (2019) Validation of the Gujarati version of Center for Epidemiological Studies Depression Scale for Children (CES-DC) and prevalence of depressive symptoms amongst school going adolescents in Gujarat, India. Journal of Clinical and Diagnostic Research 13(12), VC06–VC11. https://doi.org/10.7860/JCDR/2019/41001.13377 Google Scholar

Chapleski, EE, Lamphere, JK, Kaczynski, R, Lichtenberg, PA and Dwyer, JW (1997) Structure of a depression measure among American Indian elders: Confirmatory factor analysis of the CES-D scale. Research on Aging 19(4), 462–485. https://doi.org/10.1177/0164027597194004 CrossRef Google Scholar

Denckla, CA, Ndetei, DM, Mutiso, VN, Musyimi, CW, Musau, AM, Nandoya, ES, Anderson, KK, Milanovic, S, Henderson, D and McKenzie, K (2017) Psychometric properties of the Ndetei-Othieno-Kathuku (NOK) scale: A mental health assessment tool for an African setting. Journal of Child and Adolescent Mental Health 29(1), 39–49. https://doi.org/10.2989/17280583.2017.1310729 CrossRef Google Scholar

Ekeroma, AJ, Ikenasio-Thorpe, B, Weeks, S, Kokaua, J, Puniani, K, Stone, P and Foliaki, SA (2012) Validation of the Edinburgh Postnatal Depression Scale (EPDS) as a screening tool for postnatal depression in Samoan and Tongan women living in New Zealand. New Zealand Medical Journal 125(1355), 41–49.Google Scholar PubMed

Esler, D, Johnston, F, Thomas, D and Davis, B (2008) The validity of a depression screening tool modified for use with Aboriginal and Torres Strait Islander people. Australian and New Zealand Journal of Public Health 32(4), 317–321. https://doi.org/10.1111/j.1753-6405.2008.00247.x CrossRef Google Scholar PubMed

Esler, DM, Johnston, F and Thomas, D (2007) The acceptability of a depression screening tool in an urban, Aboriginal community-controlled health service. Australian and New Zealand Journal of Public Health 31(3), 259–263.CrossRef Google Scholar

Fernandes, MC, Srinivasan, K, Stein, AL, Menezes, G, Sumithra, R and Ramchandani, PG (2011) Assessing prenatal depression in the rural developing world: A comparison of two screening measures. Archives of Women’s Mental Health 14(3), 209–216. https://doi.org/10.1007/s00737-010-0190-2 CrossRef Google Scholar PubMed

Fitzpatrick, SA, Haswell, MR, Williams, MM, Nathan, S, Meyer, L, Ritchie, JE and Jackson Pulver, LR (2019) Learning about Aboriginal health and wellbeing at the postgraduate level: Novel application of the growth and empowerment measure. Rural and Remote Health 19,2(2019), 4708. https://doi:10.22605/RRH4708 Google Scholar PubMed

Gallis, JA, Maselko, J, O’Donnell, K, Song, K, Saqib, K, Turner, EL and Sikander, S (2018) Criterion-related validity and reliability of the Urdu version of the patient health questionnaire in a sample of community-based pregnant women in Pakistan. PeerJ (San Francisco, CA) 6, e5185–e5185. https://doi.org/10.7717/peerj.5185 Google Scholar

Ganguli, M, Dube, S, Johnston, JM, Pandav, R, Chandra, V and Dodge, HH (1999) Depressive symptoms, cognitive impairment and functional impairment in a rural elderly population in India: A Hindi version of the Geriatric Depression Scale (GDS-H). International Journal of Geriatric Psychiatry 14(10), 807–820.3.0.CO;2-#>CrossRef Google Scholar

Gelaye, B, Williams, MA, Lemma, S, Deyessa, N, Bahretibeb, Y, Shibre, T, Wondimagegn, D, Lemenhe, A, Fann, JR, Vander Stoep, A and Andrew Zhou, XH (2013) Validity of the Patient Health Questionnaire-9 for depression screening and diagnosis in East Africa. Psychiatry Research 210(2), 653–661. https://doi.org/10.1016/j.psychres.2013.07.015 CrossRef Google Scholar PubMed

Gomez Cardona, L, Brown, K, McComber, M, Outerbridge, J, Parent-Racine, E, Phillips, A, Boyer, C, Martin, C, Splicer, B, Thompson, D, Yang, M, Velupillai, G, Laliberté, A, Haswell, M and Linnaranta, O (2021) Depression or resilience? A participatory study to identify an appropriate assessment tool with Kanien’kéha (Mohawk) and Inuit in Quebec. Social Psychiatry and Psychiatric Epidemiology 56(10), 1891–1902. https://doi.org/10.1007/s00127-021-02057-1. Epub 2021 Mar 8. PMID: 33683413.CrossRef Google Scholar PubMed

Gomez Cardona, L, Yang, M, Seon, Q, Karia, M, Velupillai, G, Noel, V and Linnaranta, O (Submitted) The Methods of Improving Cultural Sensitivity of Depression Scales for Use among Indigenous Populations: A Systematic Scoping Review. Montreal, QC: Douglas Mental Health University Institute.Google Scholar

Hackett, ML, Teixeira‐Pinto, A, Farnbach, S, Glozier, N, Skinner, T, Askew, DA, Gee, G, Cass, A and Brown, A (2019) Getting it right: Validating a culturally specific screening tool for depression ( aPHQ‐9) in Aboriginal and Torres Strait Islander Australians. Medical Journal of Australia 211(1), 24–30. https://doi.org/10.5694/mja2.50212 Google Scholar

Haroz, EE, Bass, J, Lee, C, Oo, SS, Lin, K, Kohrt, B, Michalopolous, L, Nguyen, AJ and Bolton, P (2017) Development and cross-cultural testing of the International Depression Symptom Scale (IDSS): A measurement instrument designed to represent global presentations of depression. Global Mental Health (Cambridge) 4, e17. https://doi.org/10.1017/gmh.2017.16 CrossRef Google Scholar PubMed

Haroz, EE, Bass, JK, Lee, C, Murray, LK, Robinson, C and Bolton, P (2014) Adaptation and testing of psychosocial assessment instruments for cross-cultural use: An example from the Thailand Burma border. BMC Psychology 2(1), 31–31. https://doi.org/10.1186/s40359-014-0031-6 CrossRef Google Scholar PubMed

Harry, ML and Crea, TM (2018) Examining the measurement invariance of a modified CES-D for American Indian and non-Hispanic White adolescents and young adults. Psychological Assessment 30(8), 1107–1120. https://doi.org/10.1037/pas0000553 CrossRef Google Scholar PubMed

Haswell, MR, Kavanagh, D, Tsey, K, Reilly, L, Cadet-James, Y, Laliberte, A, Wilson, A and Doran, C (2010) Psychometric validation of the growth and empowerment measure (GEM) applied with Indigenous Australians. Australian & New Zealand Journal of Psychiatry 44(9), 791–799. https://doi.org/10.3109/00048674.2010.482919 CrossRef Google Scholar PubMed

Husain, N, Gater, R, Tomenson, B and Creed, F (2006) Comparison of the Personal Health Questionnaire and the Self Reporting Questionnaire in rural Pakistan. The Journal of the Pakistan Medical Association 56(8), 366–370.Google Scholar PubMed

Kaaya, SF, Lee, B, Mbwambo, JK, Smith-Fawzi, MC and Leshabari, MT (2008) Detecting depressive disorder with a 19-item local instrument in Tanzania. International Journal of Social Psychiatry 54(1), 21–33. https://doi.org/10.1177/0020764006075024 CrossRef Google Scholar PubMed

Kilburn, K, Prencipe, L, Hjelm, L, Peterman, A, Handa, S and Palermo, T (2018) Examination of performance of the Center for Epidemiologic Studies Depression Scale Short Form 10 among African youth in poor, rural households. BMC Psychiatry 18(1), 201. https://doi.org/10.1186/s12888-018-1774-z CrossRef Google Scholar

Kohrt, BA, Rasmussen, A, Kaiser, BN, Haroz, EE, Maharjan, SM, Mutamba, BB, de Jong, JT and Hinton, DE (2014) Cultural concepts of distress and psychiatric disorders: Literature review and research recommendations for global mental health epidemiology. International Journal of Epidemiology 43(2), 365–406. https://doi.org/10.1093/ije/dyt227 CrossRef Google Scholar PubMed

Labrique, AB and Pan, WKY (2010) Diagnostic tests: Understanding results, assessing utility, and predicting performance. American Journal of Ophthalmology 149, e872–881. https://doi.org/10.1016/j.ajo.2010.01.001 CrossRef Google Scholar PubMed

Leung, L (2016) Diabetes mellitus and the Aboriginal diabetic initiative in Canada: An update review. Journal of Family Medicine and Primary Care 5(2), 259–265. https://doi.org/10.4103/2249-4863.192362 CrossRef Google Scholar

Marley, JV, Kotz, J, Engelke, C, Williams, M, Stephen, D, Coutinho, S and Trust, SK (2017) Validity and acceptability of Kimberley Mum’s mood scale to screen for perinatal anxiety and depression in remote Aboriginal health care settings. PLoS One 12(1), e0168969. https://doi.org/10.1371/journal.pone.0168969 CrossRef Google Scholar PubMed

Marsella, A, Sartorius, N and Jablensky, A (1985) Cross-cultural studies of depressive disorders: An overview. In Kleinman, A and Good, B (eds), Culture and Depression; Studies in the Anthropology and Cross-Cultural Psychiatry of Affect and Disorder. Berkeley: University of California Press, pp. 299–324.CrossRef Google Scholar

Mayberry, RM, Mili, F and Ofili, E (2000) Racial and ethnic differences in access to medical care. Medical Care Research and Review 57(4), 108–145. https://doi.org/10.1177/107755800773743628 CrossRef Google Scholar PubMed

McNamara, BJ, Banks, E, Gubhaju, L, Williamson, A, Joshy, G, Raphael, B and Eades, SJ (2014) Measuring psychological distress in older Aboriginal and Torres Strait Islanders Australians: A comparison of the K-10 and K-5. Australian and New Zealand Journal of Public Health 38(6), 567–573. https://doi.org/10.1111/1753-6405.12271 CrossRef Google Scholar PubMed

Mitchell, CM and Beals, J (2011) The utility of the Kessler screening scale for psychological distress (K6) in two American Indian communities. Psychological Assessment 23(3), 752–761. https://doi.org/10.1037/a0023288 CrossRef Google Scholar PubMed

Panayides, P and Walker, MJ (2013) Evaluating the psychometric properties of the foreign language classroom anxiety scale for Cypriot senior high school EFL students. Rasch Measurement Approach 9(3), 493–516. https://doi.org/10.5964/ejop.v9i3.611 Google Scholar

Parikh, R, Mathai, A, Parikh, S, Chandra Sekhar, G and Thomas, R (2008) Understanding and using sensitivity, specificity and predictive values. Indian Journal of Ophthalmology 56(1), 45–50. https://doi.org/10.4103/0301-4738.37595 CrossRef Google Scholar PubMed

Porcerelli, JH and Jones, JR (2017) Uses of psychological assessment in primary care settings. In Maruish, ME (ed.), Handbook of Psychological Assessment in Primary Care Settings. New York: Routledge, pp. 75–94. https://doi.org/10.4324/9781315658407 Google Scholar

Sarkar, S, Kattimani, S, Roy, G, Premarajan, KC and Sarkar, S (2015) Validation of the Tamil version of short form Geriatric Depression Scale-15. Journal of Neurosciences in Rural Practice 6(3), 442–1446. https://doi.org/10.4103/0976-3147.158800 Google Scholar PubMed

Schantz, K, Reighard, C, Aikens, JE, Aruquipa, A, Pinto, B, Valverde, H and Piette, JD (2017) Screening for depression in Andean Latin America: Factor structure and reliability of the CES-D short form and the PHQ-8 among Bolivian public hospital patients. International Journal of Psychiatry in Medicine 52(4–6), 315–327. https://doi.org/10.1177/0091217417738934 CrossRef Google Scholar PubMed

Schneider, M, Baron, E, Davies, T, Bass, J and Lund, C (2015) Making assessment locally relevant: Measuring functioning for maternal depression in Khayelitsha, Cape Town. Social Psychiatry and Psychiatric Epidemiology 50(5), 797–806. https://doi.org/10.1007/s00127-014-1003-0 CrossRef Google Scholar PubMed

Schouten, BC and Meeuwesen, L (2006) Cultural differences in medical communication: A review of the literature. Patient Education and Counceling 64(1), 21–23. https://doi.org/10.1016/j.pec.2005.11.014.CrossRef Google Scholar PubMed

Schwarzbold, ML, Diaz, AP, Nunes, JC, Sousa, DS, Hohl, A, Guarnieri, R, Linhares, MN and Walz, R (2014) Validity and screening properties of three depression rating scales in a prospective sample of patients with severe traumatic brain injury. Brazilian Journal of Psychiatry 36(3), 206–212. https://doi.org/10.1590/1516-4446-2013-1308 CrossRef Google Scholar

Shen, YT, Radford, K, Daylight, G, Cumming, R, Broe, TGA and Draper, B (2018) Depression, suicidal behaviour, and mental disorders in older Aboriginal Australians. International Journal of Environmental Research and Public Health 15(3), 447. https://doi.org/10.3390/ijerph15030447 CrossRef Google Scholar PubMed

Sijtsma, K (2009) On the use, the misuse, and the very limited usefulness of Cronbach’s alpha. Psychometrika 74(1), 107–120. https://doi.org/10.1007/s11336-008-9101-0 CrossRef Google Scholar PubMed

Simon, B and Catherine, W (2009) Cultural safety: Exploring the applicability of the concept of cultural safety to Aboriginal health and community wellness. Journal of Aboriginal Health 5(2), 6.Google Scholar

Simon, R (2015) Sensitivity, specificity, PPV, and NPV for predictive biomarkers. Journal of the National Cancer Institute 107(8), djv153. https://doi.org/10.1093/jnci/djv153 CrossRef Google Scholar PubMed

Tiburcio Sainz, M and Natera Rey, G (2007) Adaptación al contexto ñahñú del Cuestionario de Enfrentamientos (CQ), la Escala de Síntomas (SRT) y la Escala de Depresión del Centro de Estudios Epidemiológicos (CES-D). Salud Mental (México) 30(3), 48–58.Google Scholar

Whiting, P, Savović, J, Higgins, JP, Caldwell, DM, Reeves, BC, Shea, B, Davies, P, Kleijnen, J, Churchill, R, ROBIS group and ROBIS group (2016) ROBIS: A new tool to assess risk of bias in systematic reviews was developed. Journal of Clinical Epidemiology 69, 225–234. https://doi.org/10.1016/j.jclinepi.2015.06.005 CrossRef Google Scholar PubMed

Figure 1. PRISMA flow diagram.

Table 1. Characteristics of culturally adapted scales

Table 2. Psychometric properties of adapted scales

Yang et al. supplementary material 1

File 62.3 KB

Yang et al. supplementary material 2

File 28.4 KB

Yang et al. supplementary material 3

File 94.6 KB

Yang et al. supplementary material 4

File 31.6 KB

Author comment: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR1

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr1

Outi Linnaranta

Mental Health, Finnish Institute for Health and Welfare, Finland

Revision round: 0

Role: author

Comments

We submit our systematic review “Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations” for consideration in your journal. This complements our scoping review on methods of cultural adaptation of depression scales. The present study suggests that modifying depression scales to fit the Indigenous context through changes to language or question structure is a culturally-sensitive strategy that increases acceptance of psychological evaluation and treatment in communities. However, increasing acceptability must be balanced with maintaining clinical utility of instruments. Considering the high prevalence of depression in these populations must be taken into account when developing culturally sensitive but specific tools.

Yours,

Outi Linnaranta, MD, PhD

Chief Physician

Finnish Institute for Health and Welfare

Helsinki, Finland

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR2

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr2

Reviewer_1

Date of review: 07 June 2023

Revision round: 0

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Reviewer declares none.

Comments

Thank you for the opportunity to review the manuscript ‘’Safe and Valid? A Systematic Review of the Psychometric Properties of Culturally Adapted Depression Scales for Use Among Indigenous Populations.’' This well-written manuscript seeks to expand the literature on the cultural safety and validity of adapted depression scales for Indigenous populations. Using a systematic review following the PRISMA guidelines, the authors synthesize and summarize the global evidence for the psychometric properties of various depression scales culturally adapted for Indigenous populations.

Reviewer’s Comments:

Strengths

The manuscript will significantly contribute to the field with a high impact.

The manuscript is original and fills a gap in the literature.

The manuscript covers global content, including research inclusion, presentation of results, and discussion.

The findings will contribute to advancing knowledge in the field.

The authors convey their ideas and present their results in an organized and structured manner.

Areas for Improvement

The authors could have provided more details in the manuscript’s methodology.

A PRISMA flow diagram would be helpful.

There are some areas where the manuscript could benefit from improved clarity or more concise language.

1. In the conclusion section of the abstract, the statement indicating a reduction in specificity and negative predictive value does not logically follow the results presented in the abstract.

2. On Page 5, line 9, the statement ‘’that are unique to other ethnic groups’' suggests shared health beliefs with other ethnic groups rather than a unique perspective.

3. Page 5, line 12 – would better psychometric evaluation improve healthcare access, as stated, or providers' ability to diagnose/treat?

4. Page 6, line 28 - The text ‘’for mental health crises amongst stakeholders’' is confusing as written and suggests that awareness is for mental health crises among stakeholders.

5. Page 6, line 31 – How should measures be comparable?

6. Page 9, line 103 – Identifying the instruments with low alphas would be helpful.

7. Page 12, line 178 – As written, the authors suggest that all adapted scales performed similarly. Clarification would be helpful.

Many references were published more than ten years ago – I recommend updating references where possible.

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR3

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr3

Melissa Haswell

Queensland University of Technology Faculty of Health, Australia

Date of review: 25 June 2023

Revision round: 0

Role: reviewer

Recommendation/decision: minor-revision

Conflict of interest statement

Reviewer declares none.

Comments

This is an important piece of work, as for a very long time, there was little scrutiny of how depression scales were working among Indigenous populations. I make two comments on the strength and importance of the paper.

Distinguishing between an expected loss of social and emotional wellbeing due to ongoing grief and loss, adverse life events and heightened inequalities in social and environmental determinants of health vs clinical depression is critically important. Practitioners working cross-culturally need tools that can be relied on to decide whether to refer a client in distress to social and emotional wellbeing support only or to also refer to a mental health clinician for medical support and determine the level of urgency.

This study identified 13 tools in the literature that have been culturally adapted to support health workers and clinicians be able to do this, and revealed that their reliability, sensitivity, validity and predictive accuracy limits is infrequently determined. Its findings that some scale properties are enhanced by cultural adaption while others are decreased allows readers to understand potential strengths and limitations of adapted tools. The paper provides an important message to health services that the psychometric properties of the tools they select for the diagnosis of depression are important, and that other processes in the clinical pathway, such as stronger decision support tools may be needed to assist anyone experiencing distress but better identify those who would benefit the most by seeing a mental health clinician. This is well described in the paper’s discussion.

I suggest the authors may explore the following document in the grey literature that was not identified, probably because it was made available through an internal mental health clinicians’ part of a state health department for 10 years and downloaded hundreds of times, and more recently was placed on a site open to the public here (https://apo.org.au/node/19597). A modified Kessler (K6+2) was used here and its psychometric properties provided in Haswell et al., 2010; validation of the Growth and Empowerment Measure. However, I wish to strongly identify a COI here and leave it to the authors’ decision whether is meets the inclusion criteria as I am the lead author.

One very important quality not mentioned here is sensitivity to change. I am conscious of that because in my own work, Kessler scale is very insensitivity to change because it bounces around wildly between intervals and I think reflects the external environment and completely normal responses to the external environment, hence a poor reflection of the effectiveness of treatment. The need to assess this could be mentioned in the Discussion for further work.

Mostly minor suggestions are below. Many are just small things about writing in the introduction, that I think could convey the meaning and importance more clearly for readers.

Impact Statement page 1 = the last sentence is incomplete, suggest deleting the word Considering.

Abstract Background –

Suggest the aim be stronger, not just to summarise but to critically examine, interrogate perhaps? Also suggest “However, the published findings on psychometric properties …”

The last sentence of the conclusion would also be reworded – eg starting with “There is an urgent need to…”

Introduction

Paragraph 1 = I suggest the authors consider placing the second sentence – Culturally (safe and) competent care can increase communication … Cultural safety is a combination … (authors disgression)

Perhaps include the word trauma-informed as well, some of the tools used in clinical settings have little regard for their potential to re-traumatise or make people feel worse, this would also go with the title of ‘Safe and valid?’

…suggest include: and to reduce communication problems (that interfere with accurate assessment).

Last sentence – perhaps clearer to state:

As a consequence, cultural adaption aims to improve access (and reduce the risk of harm) ….

Para 2 = Indigenous communities often lack (culturally safe and appropriate resources) …

(sometimes is not specialisation that’s needed, problems can rise with too much specialisation)…. This (can) result in limited access to interview-based ….

Line 28 = at community level can raise awareness (of) mental health (needs and) crises…. (However,) to make reliable …

Line 31 = are culturally safe [maybe trauma informed?] and clinically useful, reliable (when) used …

Line 33 = (Many) --- [several sounds more like three] standard qualities are required of psychometric screens and outcome measures (which are also essential for)

Line 36 = which (are) meaningful and relevant

Line 37 = clinical utility of a scale [omit depression as you seem to be speaking generally about these qualities] (assesses) its use by clinicians to diagnose and determine treatment.

Line 39 = identify people who are most likely to benefit from a clinical diagnostic interview …

Line 42 – suggest a new paragraph – The prevalence [delete rate as not a function of time] of depression and anxiety and incidence of suicide … Heightened presence of illness and suicide risk disproportionately raises the sensitivity [and reduces the specificity?]

***on this point, I think it is important to be very clear. Remember that whole communities can be at risk, not just individuals, so you don’t want reduced sensitivity in clinical diagnosis unless you acknowledge that people with less severe clinical depression be left unidentified and untreated, but this leaves the possibility that the depression may then progress to more severe manifestations if not assisted well – so need to be very careful in meaning. Reduced specificity in diagnosing an clinical mental illness is more problematic – as the ability to distinguish a loss of social and emotional wellbeing which could be best supported by community if possible rather than diagnosis and medication. Need to be very careful here.

(again the document above https://apo.org.au/node/19597 suggests ways to address that in primary health care practice in Australia).

Line 50 = you may want to say “developing measures that reflect mental health conditions” not just depression.

Line 54 = we assess the (reported) quality …

Line 201 = see above, Line 42. Is this suggestion about clinicians coming from clinicians themselves? I would again be cautious suggesting that high sensitivity is not a desirable quality, it is the low specificity that is the problem.

Line 203 = I don’t understand the sentence starting, “Yet… please clarify.

***Line 233 and Section 4.4 = very good points made here, predictive value scores reflect congruity with Western-based ideas about what is depression and how does it manifest.

One could argue that anger (measured in K6+2 Haswell et al., 2010) has been ignored in depression by Western psychiatry – but is a prominent emotional response to continuous grief and loss and injustice that can mask depression (and possibly lead to suicide) – in this case, community knowledge may enhance clinical utility (safety) in presentations that don’t match Western ideas.

The rest of the paper is also excellent and clear.

Recommendation: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR4

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr4

Jane Fisher

School of Public Health and Preventive Medicine, Monash University Faculty of Medicine Nursing and Health Sciences, Australia

Date of review: 05 July 2023

Revision round: 0

Role: Handling Editor

Recommendation/decision: accept

Comments

No accompanying comment.

Decision: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR5

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr5

Dixon Chibanda

London School of Hygiene & Tropical Medicine, United Kingdom of Great Britain and Northern Ireland

Revision round: 0

Role: Editor in Chief

Recommendation/decision: minor-revision

Comments

No accompanying comment.

Author comment: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR6

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr6

Outi Linnaranta

Mental Health, Finnish Institute for Health and Welfare, Finland

Revision round: 1

Role: author

Comments

Please find enclosed our revised article “Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations”, which we resubmit for consideration in Cambridge Prisms: Global Mental Health.

Yours,

Outi Linnaranta, adjunct professor

McGill University, Canada

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR7

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr7

Melissa Haswell

Queensland University of Technology Faculty of Health, Australia

Date of review: 14 August 2023

Revision round: 1

Role: reviewer

Recommendation/decision: accept

Conflict of interest statement

no competing interest exists

Comments

Line 21-31 is excellent.

I have no further comments.

Recommendation: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR8

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr8

Jane Fisher

School of Public Health and Preventive Medicine, Monash University Faculty of Medicine Nursing and Health Sciences, Australia

Date of review: 14 August 2023

Revision round: 1

Role: Handling Editor

Recommendation/decision: accept

Comments

Thank you for making the amendments. I am pleased to accept your revised paper.

Decision: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR9

Published online by Cambridge University Press: 14 September 2023

DOI: https://doi.org/10.1017/gmh.2023.52.pr9

Dixon Chibanda

London School of Hygiene & Tropical Medicine, United Kingdom of Great Britain and Northern Ireland

Article contents

Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations

Abstract

Keywords

Impact statement

Introduction

Methods

Assessment of quality

Results

Description of the adapted scales

Reliability

Cross-cultural validity

Criterion (concurrent and predictive) validity

Construct (convergent and discriminant) validity

Incremental validity

Clinical utility

Scale performance

Discussion

A gap in reporting characteristics

Increasing reliability and validity through cultural adaptations

Measurement invariance

Clinical utility

When could a cultural adaptation be useful?

Future lines of work

Limitations of the current study

Risk of bias

Conclusion

Open peer review

Supplementary material

Acknowledgments

Author contribution

Financial support

Competing interest

References

Author comment: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR1

Comments

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR2

Conflict of interest statement

Comments

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR3

Conflict of interest statement

Comments

Recommendation: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR4

Comments

Decision: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R0/PR5

Comments

Author comment: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR6

Comments

Review: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR7

Conflict of interest statement

Comments

Recommendation: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR8

Comments

Decision: Safe and valid? A systematic review of the psychometric properties of culturally adapted depression scales for use among Indigenous populations — R1/PR9

Comments

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests