INTRODUCTION
Surgical site infections (SSIs) which can be classified as superficial wound infections, deep wound infections, or periprosthetic joint infections (PJIs) [Reference Mangram1], are uncommon but serious complications of total joint replacements [Reference Peersman2, Reference Kurtz3]. PJIs can result in severe pain, functional deficits and even death [Reference Hunter and Dandy4–Reference Andersson6]; and their management is a huge financial burden to health care systems [Reference Kallala7, Reference Vanhegan8]. With increasing life expectancy and a growing indication for primary joint replacements [Reference Kurtz9], there will be a proportionate rise in the number of patients who will be affected by PJIs. An approach to tackle the increasing incidence of PJIs is to identify those people at high risk and offer appropriate interventions. Early and accurate identification of individuals at high risk of PJI influences clinical decisions and development of targeted preventive strategies, and helps to optimise resources required for detection of PJI. Several factors such as characteristics of the patient, surgical procedure and postoperative care, have been found to influence the risk of developing PJI [Reference Kunutsor10, Reference Triantafyllopoulos11], however their potential utility for PJI risk assessment remains uncertain.
A risk score or prognostic model is a statistical equation that predicts an individual's disease risk based on a combination of the values of multiple predictors or risk factors [Reference Steyerberg12]. Risk prediction scores are ideally developed using data from long-term follow-up of large population-based cohorts of individuals without a history of the event of interest (SSI or PJI in this case) at baseline. The dataset is used to identify important predictors and the model equation is developed [Reference Ensor13]. Using the derivation sample, the score's apparent performance is evaluated in a process known as internal evaluation. The next stage is external validation, which examines the generalisability of the model using new data. Risk prediction scores first emerged in the area of cardiovascular disease (CVD) prevention and have been widely used globally in clinical and public health practice. Well known amongst them is the Framingham CVD risk score [Reference Cook14] (a risk score which assesses an individual's risk of a cardiovascular event within 10 years), which is a commonly used algorithm in clinical practice and accepted tool in preventive medicine.
Prevention of SSIs or PJI is a high policy priority and there has been an increasing interest in the development of risk prediction tools for SSI or PJI over the last decade. However, unlike the substantial progress made in CVD prevention using risk scores, the amount of progress made in the area of SSIs or PJIs is uncertain. There is therefore a need for objective data on the development of risk scores (including their component variables), their discriminative abilities, whether they have been externally validated, and whether their clinical effectiveness have been assessed in well-designed randomised controlled trials (RCTs). In this context, using systematic review methodology, we aimed to: (i) identify and summarise studies reporting the development of risk prediction scores for SSI or PJI; (ii) assess clinical variables selected for model inclusion and the predictive performance of these models; (iii) assess if identified models have been externally validated and their performances compared; (iv) assess if the impact or clinical effectiveness of these risk scores have been evaluated in appropriate RCTs and (v) finally to identify gaps in the existing evidence and whether further research is needed in the field.
METHODS
This review was conducted using a predefined protocol, which has been registered in the PROSPERO prospective register of systematic reviews (CRD42016042158), and in line with PRISMA guidelines [Reference Moher15] (Supplementary Material 1). We searched MEDLINE, EMBASE, Web of Science and the Cochrane Library electronic databases up to 30 September 2016. The publicly available trial registers ClinicalTrials.gov, UKCRN (UK Clinical Research Network) Study Portfolio Database, and the WHO International Clinical Trials Registry Platform were also searched. The search strategy combined free and MeSH search terms and combination of key words relating to risk prediction (e.g., ‘predict’, ‘risk score’, ‘sensitivity’), SSI or PJI (e.g., ‘periprosthetic joint infection’, ‘deep infection’, ‘surgical site infection’), and joint replacement (e.g., ‘hip replacement’, ‘knee replacement’, ‘hip arthroplasty’, ‘knee arthroplasty’). No restrictions were placed on publication dates and only articles published in English were considered. Reference lists of retrieved articles and relevant review articles identified on the topic were manually scanned for all relevant additional studies. Detailed description of all Materials and Methods, as well as the Literature Search Strategy are available in Supplementary Materials 2 and 3.
RESULTS
Study identification and selection
Figure 1 shows the flow of studies through the review. Our literature search strategy identified 1802 potentially relevant articles. After the initial screening of titles and abstracts, 15 articles remained for further evaluation. Following detailed evaluation which included full-text reviews, six articles were excluded because (i) they were studies of diagnostic scores (n = 2) and (ii) they were studies of risk scores for outcomes such as readmission, infection eradication and treatment outcome of PJI (n = 4). The remaining nine articles met the inclusion criteria and were included in the review [Reference Paxton16–Reference Tikhilov24].
Study characteristics and quality assessment
Table 1 summarises characteristics of the studies in the sample. Studies were published between 2006 and 2016, with all but one appearing in 2011–2016. One study was reported as a published conference abstract [Reference Paxton16]. Overall, the studies involved 482 877 joint replacements, including 6968 SSIs or PJIs. For studies that reported age data, the baseline age of participants ranged from 56 to 81 years. The sample size of cohorts ranged from 217 to 172 055 and follow-up for infection outcomes ranged from 30 days to 2 years. For the assessment of infections, the majority of the studies used Centre for Disease Control or Infectious Diseases Society of America criteria. Studies classified infection outcomes as SSI or PJI specifically. One study employed both SSI and PJI outcomes [Reference Lewallen21] and another study used PJI recurrence [Reference Tikhilov24]. Quality assessment using PROBAST showed evidence of high overall risk of bias throughout the included studies. Five risk scores had unclear concern for overall applicability and only two scores were deemed to be usable in the targeted individuals and context (the National Healthcare Safety Network (NHSN) SSIs risk models for hip and knee arthroplasties (HPRO and KPRO)) [Reference Mu18] (Supplementary Material 4).
ATC, Anatomic, Therapeutic and Chemical Classification; CDC, Centre for Disease Control; HPRO, National Healthcare Safety Network surgical site infections risk model for hip arthroplasty; KPRO, National Healthcare Safety Network surgical site infections risk model for knee arthroplasty; NHSN, National Healthcare Safety Network; NR, not reported; NS, not stated; PJI, periprosthetic joint infection; SSI, surgical site infections; THA, total hip arthroplasty; TKA, total knee arthroplasty; ICD-10-AM, International Statistical Classification of Diseases and Related Health Problems, Tenth Revision, Australian Modification.
* Indicates the total number of SSIs for both THA and TKA.
Model description and development
Table 2 provides details of risk scores included in eligible studies: their component predictors, statistical properties, measures of discrimination and/or calibration, and reports of any validation and performance comparisons made. A total of 16 risk scores were described in the nine eligible studies. Five of these scores had separate models for hip and knee replacement patients [Reference Mu18, Reference Inacio22, Reference Maradit Kremers23]. Four studies described the development of two or more risk scores [Reference Mu18, Reference Berbari19, Reference Inacio22, Reference Maradit Kremers23]. All 16 risk scores were derivations of risk models on a base population and two of them were also externally validated on new populations [Reference Lewallen21]. Except for one study that developed the risk score based on a cohort recruited prospectively for the surveillance of SSIs [Reference Geubbels17], all studies used datasets retrospectively that had been established for different purposes. Except for the scores that were developed in both knee and hip replacement patients, the component predictors varied from score to score. However, age, sex and type of primary surgery featured in the majority of risk scores. Except for one score that was mainly based on invasive data such as ESR (erythrocyte sedimentation rate), CRP (C-reactive protein) and microbial aetiology [Reference Tikhilov24], all scores were based on data that can be assessed non-invasively such as demographics, anthropometrics, medical and surgical histories, and surgical procedures. The number of component variables in a single score ranged from 4 to 45 (n = 16, median 19, interquartile range 6·5–32·5). Seven out of the 16 risk scores had 10 or fewer components. Of the 16 risk scores, 15 used regression techniques (logistic or Cox) to develop the score and one used a classification tree [Reference Tikhilov24].
ASA, American Society of Anesthesiologist; BMI, body mass index; CRP, C-reactive protein; ESR, erythrocyte sedimentation rate; HL, Hosmer–Lemeshow; HPRO, National Healthcare Safety Network surgical site infections risk model for hip arthroplasty; KPRO, National Healthcare Safety Network surgical site infections risk model for knee arthroplasty; IDI, Integrated Discrimination Index; N/A, not applicable; NNIS, National Nosocomial Infection Surveillance; NHSN, National Healthcare Safety Network; NR, not reported; NS, not stated; PJI, periprosthetic joint infection; SSI, surgical site infection; THA, total hip arthroplasty; TKA, total knee arthroplasty.
† It was an external validation study of the risk models HPRO and KPRO.
Model diagnostics
Except for three studies (comprising of three risk scores) [Reference Paxton16, Reference Bozic20, Reference Tikhilov24], the C-statistic was reported for 13 risk scores. The C-index ranged from 0·56 to 0·74. Only three risk scores were reported to have a discriminative ability of >0·70 and these were the baseline Mayo and 1-month-postsurgery Mayo PJI risk scores as reported by Berbari et al. [Reference Berbari19] and HPRO which was externally validated by Lewallen et al. [Reference Lewallen21], Calibration measures were presented for 11 risk scores (including the baseline Mayo PJI risk score) and each was reported to have satisfactory model calibration. Two studies did not report on any measures of discrimination or calibration [Reference Bozic20, Reference Tikhilov24] (Table 2).
Model validation
Only five of the risk scores were validated internally using resampling techniques such as bootstrapping and cross-validation [Reference Geubbels17, Reference Mu18, Reference Maradit Kremers23]. These included (i) a total hip arthroplasty (THA)-specific risk model for SSI, developed using data collected from 62 acute care hospitals within the Dutch surveillance network for nosocomial infections [Reference Geubbels17]; (ii) the HPRO and KPRO [Reference Mu18] and (iii) claims-based risk models for THA and total knee arthroplasty (TKA) [Reference Maradit Kremers23]. Only the HPRO and KPRO risk scores were externally validated using an independent dataset in a different study [Reference Mu18]. Although the HPRO score performed better in the external cohort compared with the internal validation cohort, the KPRO risk score performed much less well when tested on the external cohort compared with the internal cohort (Table 2).
Performance comparisons
The performances of five risk scores were compared with existing models in three studies [Reference Geubbels17–Reference Berbari19]. Geubbels et al. compared the predictive performance of their newly developed THA-specific risk score for SSI with the NNIS (National Nosocomial Infection Surveillance) system risk index (which incorporates three risk factors of equal weight namely wound contamination class, American Society of Anesthesiologists (ASA) score, and duration of surgery), and reported better predictive performance for the new risk score (C-index: 0·64 vs. 0·56; P < 0·001) [Reference Geubbels17]. Mu et al. also reported statistically significantly better performances for the HPRO and KPRO risk scores when compared with the traditional NHSN SSI risk model, though the C-statistics were generally low (<0·70) [Reference Mu18]. The baseline Mayo and 1-month-postsurgery Mayo PJI risk scores also performed well compared with the traditional NHSN SSI risk score (C-index: 0·72 vs. 0·64; P < 0·001) and (C-index: 0·72 vs. 0·63; P < 0·001), respectively [Reference Berbari19]. Two studies assessed the incremental prognostic value of adding additional risk factors to their existing models [Reference Lewallen21, Reference Maradit Kremers23]. Lewallen et al. externally validated the HPRO and KPRO risk scores and reported that addition of information on morbid obesity and diabetes mellitus to each score modestly improved discrimination [Reference Lewallen21]. On addition of four clinical risk factors (morbid obesity, prior non-arthroplasties on the same joint, ASA score and operative time) to their claims-based risk models for THA and TKA, Maradit Kremers et al. reported improved performance (by C-statistics) for both models, though the THA model showed better performance than the TKA model [Reference Maradit Kremers23]. There was however no noticeable improvement in calibration for both models. Finally, whiles there was an improvement in IDI (Integrated Discrimination Index) for the THA score, no significant improvement was seen for the TKA score: 0·37% (0·12% to 0·62%) and 0·09% (−0·02% to 0·21%), respectively.
Clinical evaluation of risk scores
None of the studies described the evaluation of the clinical effectiveness of a score in an intervention study or as part of an impact study aimed at changing patient outcomes.
DISCUSSION
Key findings
Using systematic review methods, we have reported the first overview of available risk assessment scores for SSI or PJI following joint replacement. Based on established quality criteria for risk scores [Reference Altman25, Reference Noble26], none of the risk scores in our review were judged to be promising for use in clinical settings or public health practice, except for the HPRO. The HPRO is a procedure-specific risk score which was adapted from the traditional NHSN risk index using NHSN data and its purpose is for predicting SSI or PJI within 1 year of hip replacement [Reference Mu18]. The HPRO was found to perform better than the traditional NHSN risk index and external validation in an independent cohort showed high discriminative ability [Reference Lewallen21]. The HPRO also showed higher accuracy for predicting PJI compared with SSI. The data also show that risk prediction models for SSI or PJI have only been developed over the past 5 years. Of the 16 risk scores identified, only seven had 10 or few components included in the final score, with a number of scores having between 30 and 45 components. Although all 11 risk scores reporting calibration measures exhibited satisfactory calibration, only three of these risk scores were reported to have a discriminative ability of >0·70. Of all 16 risk scores, HPRO and KPRO were the only risk scores externally validated in an independent population. Quality assessment of the risk scores’ development and validation criteria showed all scores to have a high risk of bias. This was mainly due to the methodology used in assessment of predictors and outcomes, inappropriate handling of missing data, and lack of external validation.
Explanations and implications of findings
Our findings highlight the limited evidence available on appropriate risk scores for predicting SSI or PJI after joint replacement. Given the absence of an ideal risk score which can be used in a routine clinical setting, it appears that the potential value of risk scores in preventing SSI or PJI may have been underestimated in orthopaedic practice. The findings also highlight the use of poor methodology in the development of some of these risk scores. Although cross-sectional study designs were not included, the included studies were not free from bias and confounding. The majority of the designs were based on retrospective cohorts instead of prospective cohort designs, which are ideal for risk score modelling as predictor information can be ascertained blindly in relation to the outcome or disease [Reference Ensor13]. None of the risk scores was developed in a cohort recruited for this sole purpose, which introduced an inherent selection bias. A key methodological issue was the absence of clear and detailed reporting of the treatment of missing data in all studies, which is of utmost importance prior to the development of risk scores [Reference Steyerberg12]. Included studies used complete case analysis in the presence of missing data, which does not represent the entire population and reduces the sample size [Reference Ensor13]. It has been shown that risk scores that use multiple imputation, produce more valid results and have better discrimination than tools that ignore such additional analyses [Reference Janssen27]. There were also concerns with usability of the risk scores, as the majority of the risk scores had more than 10 variables. It is recognised that the simplicity of the model is an important criteria for developing clinically useful risk scores [Reference Wyatt and Altman28, Reference Moons29]. Evidence suggests that complex models are more likely to provide overoptimistic predictions, especially when extensive variable selection has been performed [Reference Sauerbrei30]. Only five of the risk scores were validated internally using resampling methods, which are techniques which give a good indication of how optimistic the risk score may be [Reference Steyerberg31]. Although internal validation is helpful, it cannot provide information on the model's performance elsewhere or its generalisability. Before a risk prediction tool can be used in clinical practice or in real-world settings, evaluation of its generalisability (or transportability) requires data from elsewhere – also known as external validation [Reference Steyerberg12]. However, only two risk scores were externally validated in our sample [Reference Lewallen21]. Finally, none of the risk scores was reported to have been used in an impact study aimed at changing patient outcomes. Before a risk score can be implemented, a vital criterion that needs to be fulfilled is its impact on clinical practice [Reference Steyerberg12]. Among the identified risk scores, only the HPRO was found to be potentially promising for use in a clinical setting. However, it cannot be considered ready for use as its clinical effectiveness is still yet to be evaluated. The unavailability of appropriate existing risk scores for use in the clinical setting is extremely concerning. To add to this challenge is the lack of established uniform criteria for the diagnosis of infection especially PJI, which actually makes it difficult to conduct diagnostic or risk prediction studies for infection. Although hip and knee replacements are successful elective procedures, with SSIs or PJIs being rare complications of these procedures [Reference Kurtz3, Reference Dale32]; the incidence of these infections will increase in conjunction with growing healthcare burden due to osteoarthritis [Reference Cross33] and a predicted large rise in the numbers of arthroplasty procedures [Reference Kurtz34, Reference Patel35]. To meet this challenge, there should be a clinical drive towards identification of individuals at high risk of SSIs or PJIs using risk prediction engines. The current findings should stimulate research groups to develop and evaluate appropriate infection outcome-specific risk prediction algorithms using robust methodology. The clinical effectiveness of the HPRO also needs to be evaluated before it is implemented. Within our 5 year INFORM (INFection ORthopaedic Management) Programme, the aim is to develop and establish optimum strategies for the prevention and treatment of PJIs within the UK National Health System [Reference Bellamy, Rousseau and Gardner36], and which may include the development of appropriate risk prediction engines when the data allows.
Study strengths and limitations
To the best of our knowledge, this is the first systematic review to identify limited progress in the development and validation of risk prediction models for SSI or PJI following joint replacement, using robust systematic methodology. It is also the first review to assess the validity of existing risk scores based on risk of bias and applicability. Our search strategy was comprehensive and spanned multiple databases, making it unlikely that any relevant study was missed. There was variation in the definition of SSIs in the included studies, which did not allow for a head-to-head comparison of risk scores across studies. We were unable to harmonise data from contributing studies to perform a quantitative analysis, due to the heterogeneity in study designs and populations, predictors used, model types, and measures reported. Even though we tried to present the data as robustly as possible using established criteria, our conclusions might be limited due to the quality of published research and the large variability across study characteristics and methodologies.
CONCLUSION
In conclusion, available risk scores to predict SSI or PJI have been developed using poor methodology and have several limitations. The majority of these risk scores have not been externally validated and are not ideal for use in clinical settings. The HPRO is the only risk prediction tool identified to show some promise for use in a clinical setting (based on its predictive performance and having some external validation); however, it needs further validation using new data and its clinical effectiveness should be evaluated using a RCT design. A potentially effective way of tackling the increasing incidence of SSIs is early and accurate identification of individuals at high risk using established risk prediction scores, an approach which has been very effective in the area of CVD prevention. Further research is urgently warranted within the field to develop and test appropriate outcome-specific risk prediction tools.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at https://doi.org/10.1017/S0950268817000486
ACKNOWLEDGEMENTS
This publication is part of the INFection ORthopaedic Management (INFORM) Programme. As such it benefits from involvement of the whole INFORM team. The INFORM team includes: Simon Strange, Setor Kunutsor, Kirsty Garfield, Erik Lenguerrand, Rachael Gooberman-Hill, Drew Moore, Amanda Burston, Joanne Simon, Garry King, Michael Whitehouse, Vikki Wylde, Andrew Beswick, Ashley Blom, Sian Noble, Athene Lane, Fran Carroll (Musculoskeletal Research Unit, School of Clinical Sciences, University of Bristol, Southmead Hospital, Southmead Road, Bristol, BS10 5NB, UK); Sian Noble, Athene Lane, Fran Carroll (School of Social and Community Medicine, University of Bristol, Bristol, BS8 2PS, UK); Jason Webb, Alasdair MacGowan (North Bristol NHS Trust, Southmead Hospital, Bristol, BS10 5NB, UK); Stephen Jones (Cardiff and Vale University Health Board, Longcross Street, Cardiff, CF24 0SZ, UK); Adrian Taylor (Oxford University Hospitals NHS Trust, John Radcliffe Hospital, Oxford OX3 9DU, UK); Paul Dieppe (University of Exeter, Medical School, Exeter, EX1 2LU, UK); Andrew Toms, Matthew Wilson (Royal Devon and Exeter NHS Foundation Trust, Newcourt House, Exeter, EX2 7JU, UK); Ian Stockley (Sheffield Teaching Hospitals NHS Trust, Northern General Hospital, Sheffield, S5 7AU, UK); Ben Burston, John-Paul Whittaker (The Robert Jones and Agnes Hunt Orthopaedic Hospital NHS Foundation Trust, Oswestry, Shropshire, SY10 7AG, UK); Tim Board (Wrightington, Wigan and Leigh NHS Foundation Trust, Apple Bridge, Wigan, Lancashire, WN6 9EP, UK); and all the surgeons and nurses from the collaborating centres. This article presents independent research funded by the National Institute for Health Research (NIHR) under its Programme Grants for Applied Research program (RP-PG-1210-12005). The views expressed in this article are those of the authors and not necessarily those of the NHS, the NIHR, or the Department of Health.
DECLARATION OF INTEREST
M.R.W. reports grants from the National Institute of Health Research during the conduct of the study; grants from British Orthopaedic Association, which were outside the submitted work.