Randomised controlled trials have generally been accepted as the gold standard when deciding which interventions work in psychiatry (World Health Organization, 1991). Most randomised studies in psychiatry have investigated the effect of drug or psychotherapy interventions in tightly controlled and largely artificial experimental conditions (Reference Hotopf, Lewis and NormandHotopf et al, 1997; Reference Thornley and AdamsThornley & Adams, 1998), while patients, clinicians and other decision-makers need to know how treatments work in the real world and whether they are cost-effective under routine conditions (Reference WellsWells, 1999). Important questions relating to the organisation and delivery of mental health services are also rarely addressed in randomised trials (Reference Gilbody and WhittyGilbody & Whitty, 2002).
The need for research relating to effectiveness (rather than efficacy) has prompted a number of responses. One has been the call to conduct randomised trials in ‘real world’ settings, using pragmatic designs (Reference Hotopf, Churchill and LewisHotopf et al, 1999); another has been to synthesise various data sources using decision analysis (Reference Lilford and RoystonLilford & Royston, 1998). A response that has been influential in the USA in the past decade involves the analysis of large databases of patient information collected in routine care settings — known as outcomes research (Anonymous, 1989; Reference EllwoodEllwood, 1988; Reference Wennberg and GinzbergWennberg, 1991).
ORIGINS OF OUTCOMES RESEARCH
The ‘outcomes’ movement emerged as a consequence of rapidly escalating costs, acceleration of the introduction of new health technologies and evidence of massive regional variations in the delivery of health care in the USA (Reference WennbergWennberg, 1990; Reference ThierThier, 1992; Reference Wennberg, Barry and FowlerWennberg et al, 1993; Reference Davies and CrombieDavies & Crombie, 1997). Paul Ellwood, in his 1988 Shattuck lecture (Reference EllwoodEllwood, 1988), ushered in the modern outcomes movement and called for the routine collection of outcome measures by clinicians. He proposed that these records should be assimilated in large databases that would form a resource for clinical and health services research. Such data could eventually be used inter alia to compare existing treatments and to evaluate new technologies, thereby avoiding both the expense of clinical trials and the loss of generalisability that results from selective recruitment to conventional efficacy trials.
The Agency for Health Care Policy and Research (AHCPR), now the Agency for Healthcare Research and Quality (AHRQ), was established in the USA under public law in 1989 in order to conduct outcomes research into common medical conditions, with the establishment of patient outcome research teams (PORTs; Reference Wennberg, Barry and FowlerWennberg et al, 1993). The research programme was allocated US$6 million in its first year, rising to $63 million in 1991, with the purpose of using routine outcomes data to determine ‘outcomes, effectiveness and appropriateness of treatments’ (Reference AndersonAnderson, 1994). It was decreed by Congress via the General Accounting Office (1992) that new primary research conducted by the PORTs was not to take the form of the traditional randomised controlled trial; rather, it was to be observational in design, utilising the vast amounts of data routinely collected on US patients. This health research policy produced a new breed of health researchers known as database analysts (Anonymous, 1989, 1992), with the motto ‘Happiness is a humongous database’ (Reference SmithSmith, 1997).
Outcomes research differs from traditional observational or quasi-experimental research in a number of ways. The key difference is that outcomes research evaluates competing interventions that are already used in routine care settings, using routine data collected by clinicians or by other agencies (such as insurance companies), whereas quasi-experimental studies implement interventions in one setting or in one group of patients, and compare outcomes with patients who have not been subjected to the intervention (Reference Gilbody and WhittyGilbody & Whitty, 2002). Quasi-experimental studies are therefore more like randomised trials and are considered to be clearly different in their approach and ethos to outcomes research (Reference Aday, Begley, Lairson and SlaterAday et al, 1998). The outcomes that are studied in outcomes research are generally those that are already collected as part of routine care, although there is no reason why these cannot be extended in the light of the specific question being asked.
The application of outcomes research to UK mental health services has been advocated in psychotherapy (Reference Barkham, Evans and MarginsonBarkham et al, 1998; Reference Mellor-Clarke, Barkham and ConnellMellor-Clarke et al, 1999; Reference GuthrieGuthrie, 2000; Reference Margison, Barkham and EvansMargison et al, 2000). Similarly, the pharmaceutical industry is keen to extend the method in the evaluation of new and relatively expensive drug therapies; for example, the Schizophrenia Outpatient Health Outcomes Study (SOHO), funded by Eli Lilly, aims to recruit European collaborators to collect outcomes from patients with schizophrenia who are in receipt of typical and atypical drugs. Others have urged caution (Reference SheldonSheldon, 1994); the principal concerns that have been expressed about outcomes research are their observational (rather than experimental) design; the poor quality of the data that are used; the inability to adjust sufficiently for case mix and confounding; and the absence of clinically meaningful outcomes in routinely collected data (Reference IezzoniIezzoni, 1997).
This article presents the first systematic overview of the application of outcomes research in evaluating competing interventions in mental health, and discusses how this approach might meet the needs of clinicians and decision-makers.
METHOD
A search was made for all published examples of outcomes research conducted in psychiatric settings or among psychiatric populations, where two or more competing interventions were compared. (See Appendix for search strategies.)
Inclusion criteria
Reports were included if they fulfilled the following criteria:
-
(a) The research was conducted in a setting that was part of usual care in a healthcare system
-
(b) The outcome data used were those collected routinely for all patients — either for administrative purposes or as a means of monitoring outcomes in the service being evaluated.
Exclusion criteria
We excluded studies that examined only the costs or processes of illness and health care from routinely collected data, with no linkage to the outcomes of care. For example, primary care prescription databases have been used to conduct research into newer psychotropic drugs (e.g. Reference Donoghue, Tylee and WildgustDonoghue et al, 1996), but since they are not linked to patient-level data and outcomes, they cannot be considered as outcomes research.
Also excluded were quasi-experimental or non-randomised evaluations of new technologies, where an intervention was implemented and outcomes measurement systems established only in the course of its evaluation (Cook & Campell, 1979). For example, the PRiSM psychosis study (Reference Thornicroft, Strathdee and PhelanThornicroft et al, 1998) is an example of a quasi-experimental evaluation of a model of community care for those with severe mental illness, where districts were non-randomly allocated to implement an experimental service, and outcomes were measured under experimental and control conditions as part of the study.
Studies that only examined the relation between patient characteristics and outcome, with no direct comparison between competing treatments or health policy strategies (e.g. Reference Rosenheck, Leda and FrismanRosenheck et al, 1997), were excluded, as were reports of routine outcomes measurement in practice, with no direct report of comparative service or treatment evaluations based on the data.
Data extraction
Data were extracted on the following topics: population; clinical or organisational question being asked; setting; sample size and length of follow-up; outcomes studied and their source; adjustment for case mix and confounding; and results.
RESULTS
Despite the widespread advocacy of outcomes research in health care, only nine published examples were found relating to mental health. Most of these were published in the past 3 years, highlighting an increase in the use of the design. The scope, design and analysis of the studies we identified are summarised in Table 1, and their most important characteristics are reviewed below.
Study | Clinical problem, population and setting | Clinical or organisational question or hypothesis being examined | Source of outcomes data and sample size | Outcomes studied | Methods used in adjusting for case mix | Results |
---|---|---|---|---|---|---|
Medical Outcomes Study (Reference Wells, Stewart and HaysWells et al, 1989) | Depression (major depression, dysthymic disorder and subthreshold) being managed in family practices and specialist health care providers |
|
Data routinely collected by clinicians and research workers during the course of the study on 1772 patients | Detection of depression by physicians | Baseline demographic data and case mix measured and adjusted for (including medical comorbidity, psychiatric comorbidity, past history of depressive episodes) | Depression is generally underrecognised, inadequately treated and associated with a poor level of functioning |
Adequacy of treatment | Depression is associated with poorer quality treatment and outcome when a prepayment plan is in place, rather than a fee for service | |||||
Depressive symptoms (including HRSD scores) | ||||||
Health status (including SF36) | ||||||
Lam & Rosenheck (Reference Lam and Rosenheck1999) | Severe mental illness among homeless people contacted through street outreach | Is case management as effective for homeless people contacted on the streets as for those contacted through shelters and other service agencies? | Routinely collected data from a 5-year, 18-site demonstration project which established and sought to evaluate outreach services for the homeless mentally ill; n=5431 (Reference Randolph, Blasinsky and LeginskiRandolph et al, 1997) | Depressive and psychotic symptoms | Those in receipt of street outreach (n=434) were compared with those receiving conventional outreach after adjusting for baseline socio-demographic differences, and baseline differences in psychosis and substance misuse | Assertive outreach resulted in client improvement in 14 out of 20 outcome indicators. These benefits persisted and were similar to conventional outreach, following adjustment for case mix and confounding |
Alcohol and drug misuse | ||||||
Housing | ||||||
Paid employment | ||||||
Social support | ||||||
Quality of life and service use | ||||||
Rosenheck et al (Reference Rosenheck, Druss and Stolar1999a ) | Mental health service use among those enrolled in a health insurance plan following mental health spending cutbacks | Do cutbacks of mental health coverage by an insurer result in increased non-mental health service utilisation and reduced productivity? | Employee work records and health care claims data relating to 20814 employees in a single US corporation | Mental health and non-mental health service use (number of days of in-patient and out-patient health care) Health-care costs Days absent from work | Baseline differences between years in terms of socio-demographic factors, employment, income and state of employment | Reduction in mental health care utilisation was accompained by a marked increase in non-mental health care service use and costs, and sick time |
Leslie & Rosenheck (Reference Leslie and Rosenheck2000) | Individuals in receipt of US public sector (VA) and privately insured in-patient mental health care | Is publicly insured health care of lower quality and associated with poorer outcome compared with privately insured health care? | Routinely collected VA outcomes data were available on 180 000 inpatient episodes | Length of stay | Adjustment made for known and measured confounders (age, gender, diagnostic category and psychiatric comorbidity) | The VA patients were older and more prone to psychiatric illness |
Followed up for 6 months following discharge | Routinely collected data from 7 million privately insured lives were available on a commercially available database — 6000 in-patient episodes were selected | Readmission rates (14, 30 and 180 days after discharge) | No data available on important confounders, including socio-economic status, employment, homelessness, health status and level of disability | Quality indicators and outcome were poorer for VA care than privately insured care | ||
Proportion receiving out-patient care | The results are largely impossible to interpret, given that the observed difference may be real, or an artefact of case mix | |||||
Rosenheck et al (Reference Rosenheck, Stolar and Fontana2000) | US patients with chronic war-related PTSD being treated in VA in-patient programmes | Is an innovative psychosocial treatment — compensated work programme (CWP) — effective in routine care settings? | Routine data for all patients in receipt of VA in-patient mental health care | PTSD symptoms | Matching patients to controls by selecting characteristics that predict participation in the intervention condition (propensity scoring; Reference RubinRubin, 1997) | CWP has no impact on any of the outcomes measured, compared with controls, when adjusted analyses were conducted |
Followed up for 4 months | Supplemented by disease-specific measures collected for all patients in receipt of care for PTSD | Substance misuse | Logistic regression of baseline differences on PTSD symptom scores between CWP patients and controls | The treatment is likely to be clinically and cost ineffective. A formal randomised trial is not justified on the basis of this observational study | ||
Complete data on 542 patients in receipt of CWP, with 542 matched controls in receipt of routine or standard care for PTSD | Violent behaviour | |||||
Employment and medical status | ||||||
Melfi et al(Reference Melfi, Chawla and Croghan1998) | US patients in receipt of antidepressant medication for depressive disorders | Does adherence to antidepressant treatment guidelines prevent the relapse and recurrence of depression? | Compliance with treatment guidelines operationally defined as having made a claim for four or more antidepressant prescriptions over a 6-month period following initiation of medication | Relapse or recurrence during an 18-month follow-up period was defined as the initiation of a new antidepressant prescription, or by evidence of a suicide attempt, hospitalisation, mental health-related emergency room visit, or receipt of electroconvulsive therapy | A series of general comorbidity adjustments were made using hospitalisation for any other physical disorder, together with demographic variables | Patients with four or more prescriptions of antidepressants were less likely to relapse |
4052 patients classified into one of three groups, according to whether they met this criterion from Medicaid claims records | Severity of depression was controlled for using proxy measures, including whether an individual was seen by a mental health specialist | |||||
Croghan et al(Reference Croghan, Melfi and Dobrez1999) | Depression being managed in primary care | Does specialist referral for psychotherapy improve compliance with antidepressant therapy, compared with those managed exclusively in a primary care setting? | A commercially available medical insurance database of linked pharmacy and medical claims data on 750 000 individuals | Use of antidepressants ascertained from claims | There were substantial differences between those in receipt of care in primary and specialist settings in terms of age, gender and previous history of depression | Referral to a specialist increases the chance of receiving continuous antidepressant therapy by 11% in adjusted analyses. The authors calculate cost-effectiveness ratios to achieve this benefit and conclude that continuous medication is likely to be a good proxy measure of improved outcome |
Those with complete claims data and a new prescription of antidepressants were followed up over 12 months from initiation of prescription (n=2678) | Continuous medication use over 6 months is taken to be a proxy measure of effective antidepressant therapy and good outcome (Agency for Health Care Policy Research, 1993) | Previous claims, hospitalisations and diagnoses of depression were used to adjust, using logistic regression | ||||
Total health care costs were also measured from cost claims data | ||||||
Hylan et al(Reference Hylan, Crown and Meneades1999) | Patients in receipt of pharmacotherapy for depression in primary care settings | Is there a difference between different SSRI antidepressants in terms of patient compliance? | A commercially available medical insurance database of linked pharmacy and medical claims data on 750 000 individuals | Continuous prescription of the same antidepressant, without dosage change, or switch between different drugs or drug classes over 6 months, was taken as a proxy measure of a successful initial choice of antidepressant | Logistic regression of available confounders included demographic details; severity of depression from ICD codes; comorbid drug and alcohol problems; comorbid physical disorder (counts of other ICD codes); provider characteristics (primary care or specialist) | Patients in receipt of fluoxetine were more likely to receive continuous prescriptions over a 6-month period, when compared with sertraline or paroxetine. The authors conclude that fluoxetine is better-tolerated than either sertraline or paroxetine |
Complete episodes available on 1034 patients in receipt of a new SSRI prescription | ||||||
Hong et al(Reference Hong, Rak and Ciuryla1998) | US patients with relapsing schizophrenia and high levels of health care resource use | Is a newer antipsychotic (quetiapine) associated with better compliance, and therefore lower rates of rehospitalisation, when compared with conventional treatment? | Those with schizophrenia (n=1400) selected from a commercially available insurance claims database, coupled with a Medicaid claims file, providing detailed health care costs and resource use on 5% of the 5 million Californian Medicaid population | Hospital readmission rates and the prevalence of high service utilisation were calculated | Not applicable | The annual hospital readmission rate was 50% |
A power calculation was used to design a prospective, randomised trial | A prospective randomised trial would need 182 patients per arm in order to detect a 15% reduction in readmission with 80% power |
Research questions addressed
Outcomes research has been used broadly in two areas of mental health research.
Evaluation of mental health policy, including aspects of service delivery, organisation and finance
The earliest and perhaps most important example of outcomes research in mental health is the Medical Outcomes Study (MOS) conducted by the RAND Corporation in the USA in the late 1980s (Reference Tarlov, Ware and GreenfieldTarlov et al, 1989; Wells et al, Reference Wells, Stewart and Hays1989, Reference Wells, Sturm and Sherbourne1996). The design and objectives of this study were shaped by US health-care policy debates on the role of financing and reimbursement strategies in private care (fee for service v. prepayment) and on the place of speciality (secondary) care.
The researchers justified the use of observational methods in two ways. First, they claimed that the cheaper design and reduced burden on participants could maximise the number and range of collaborators and patients, particularly from non-research settings. Second, they claimed that the specific research questions precluded the use of randomisation, since the very act of randomisation would alter the functioning of existing health-care delivery systems (Reference Wells, Sturm and SherbourneWells et al, 1996).
Three other studies looked at health policy and organisation questions, such as the consequences of the withdrawal of mental health benefits from insurance plans (Reference Rosenheck, Druss and StolarRosenheck et al, 1999a), the effectiveness of services directed at homeless people (Reference Lam and RosenheckLam & Rosenheck, 1999) and the difference in outcome between privately and publicly funded health providers (Reference Leslie and RosenheckLeslie & Rosenheck, 2000).
Evaluation of new technologies
Four studies (Reference Hong, Rak and CiurylaHong et al, 1998; Reference Melfi, Chawla and CroghanMelfi et al, 1998; Reference Croghan, Melfi and DobrezCroghan et al, 1999; Reference Hylan, Crown and MeneadesHylan et al, 1999) used an outcomes research design to demonstrate the worth of new anti-depressant and antipsychotic medication in routine care settings. One further study (Reference Rosenheck, Stolar and FontanaRosenheck et al, 2000) examined the value of an innovative psychosocial intervention for those with war-related post-traumatic stress disorder (PTSD).
Source and choice of cases and outcomes
Outcomes studies can be broadly be divided into those that collect data prospectively on a service-wide level, where the choice of outcomes is decided a priori and is influenced by the research question or population under examination, and those that use existing outcomes data, collected for other purposes.
The MOS is the best-known example of prospective outcomes research. The authors set out to measure patient-centred outcomes, in addition to clinician-rated depressive symptoms within existing health care services. The enduring legacy of the MOS is the fact that patient-centred measures of health status were developed for the study and eventually evolved into the Short Form 36 (SF36) (Reference Stewert and WareStewert & Ware, 1992) — now the most commonly used generic measure of health-related quality of life.
A further study (Reference Rosenheck, Stolar and FontanaRosenheck et al, 2000) measured a number of outcomes, including disease-specific measures relating to the underlying condition (PTSD), measures of social function, health-related quality of life, and service use. This study used a large, existing data-set describing all of the 600 000 patients in receipt of mental health care under the US Veterans Affairs (National Committee on Quality Assurance, 1995). It was supplemented with routinely collected disease-specific patient outcomes measures collected for all patients in receipt of care for PTSD (Reference RosenheckRosenheck, 1996).
All the other studies that we identified used existing outcomes already entered on large administrative databases, studying a much more limited range of outcomes. For example, studies examining the value of new antidepressant drugs in routine care settings used a commercially available medical insurance database of linked pharmacy and medical claims data on 750 000 individuals (Reference Melfi, Chawla and CroghanMelfi et al, 1998; Reference Croghan, Melfi and DobrezCroghan et al, 1999; Reference Hylan, Crown and MeneadesHylan et al, 1999). Cases of depression were identified retrospectively, either from a reimbursemnet claim for antidepressant medication or by the presence of one of six ICD codes indicative of depression (World Health Organization, 1992). This approach is hampered by the fact that antidepressant drugs are commonly prescribed for a number of conditions other than depression (Reference Streator and MossStreator & Moss, 1997). Similarly, depression is consistently underidentified by clinicians (Reference JencksJencks, 1985) and mislabelled or under-reported, in part as a consequence of the stigma of mental illness (Reference Rost, Smith and MatthewsRost et al, 1994).
Commercially available administrative databases also hold no direct information about disease severity, such as scores on symptom rating scales. Disease progression, relapse or remission cannot be directly measured, and database studies are forced to use alternatives. For example, Hylan et al (Reference Hylan, Crown and Meneades1999) used continuous 6-month claims for refills of prescriptions as a proxy measure of acceptable pharmacotherapy and therefore good outcome, ignoring the fact that patients discontinue medications for a whole host of reasons other than treatment failure.
Sample size and length of follow-up
Sample size was generally much greater than that achieved in the traditional randomised trial, with a median sample size of 2678 (range 1034 to 20 814). Studies that recruited subjects prospectively, such as the MOS (Reference Wells, Stewart and HaysWells et al, 1989), achieved smaller sample sizes (n=1772) than those selecting subjects retrospectively from large, existing data-sets (Reference Croghan, Melfi and DobrezCroghan et al, 1999; Reference Rosenheck, Druss and StolarRosenheck et al, 1999a) (median n=4052). Periods of follow-up were of median 6 months (range 4 to 48 months).
Adjustment for confounding and case mix
All studies made some attempt to describe and adjust for confounding factors, typically using some form of regression analysis or propensity scoring (Reference RubinRubin, 1997). Authors rarely reported each of the potentially confounding factors that were entered into their analysis — often restricting reports to those that were positive and related to outcome. However, it was clear that the ability of studies to adjust for confounding was determined by the collection or availability of suitable measures. Two studies serve to illustrate the contrast between limited and more complete adjustment for confounding.
The authors of the MOS prospectively measured a broad range of case-mix variables, including disease severity and comorbidity, in addition to traditional demographic characteristics such as age, gender and socio-economic status. This is especially important in the MOS since the type of health care provider is inexorably linked to disease severity, making unadjusted comparisons of outcome impossible to interpret. One of the more unexpected results of the MOS demonstrates the limitation of an observational approach and the need to measure and adjust for case mix and confounding. In unadjusted samples, the receipt of any treatment (antidepressant medication or counselling) was associated with a much worse 2-year outcome than the receipt of no treatment. In analyses that adjusted for baseline health differences, treated and untreated patients had a comparable 2-year outcome. In a subgroup analysis, designed to minimise unmeasured biases by restricting the analysis to those with the most severe depression, treatment was in fact associated with a significantly better 2-year outcome (Reference Wells, Sturm and SherbourneWells et al, 1996; Reference WellsWells, 1999).
In contrast, outcomes studies based on administrative data are much more limited in their ability to measure and adjust for confounding. For example, in retrospective database studies of new antidepressant drugs (Reference Melfi, Chawla and CroghanMelfi et al, 1998; Reference Hylan, Crown and MeneadesHylan et al, 1999) disease severity could not be measured since these data were not directly included in administrative data and could only be crudely inferred from the setting in which care was given (primary v. secondary care).
DISCUSSION
Despite the enthusiasm with which outcomes research was adopted and funded in the USA, by the 1990s its value was being called into question. The US Office of Health Technology Assessment (1994) offered a stinging appraisal: ‘Contrary to the expectations expressed in the legislation establishing the AHCPR… administrative databases have generally not proved useful in answering questions about the comparative effectiveness of alternative medical treatments.’ Clearly, the superficially appealing opportunity to generate largescale studies from readily available and existing data sources should be approached with caution. This review highlights both the strengths and the limitations of outcomes research as a method for evaluating mental health services.
Strengths of outcomes research
The criticism is often made that randomised trials are undermined by the fact that the participants form a highly selected and homogeneous group, and their health care and follow-up are different from that received by the majority of patients (Anonymous, 1994). The consequence is that it is not always possible to apply the results in clinical practice — in other words, trials lack external validity (Reference NaylorNaylor, 1995).
One potential advantage of outcomes research is that observational data are routinely collected for all patients and the results can therefore be applied more generally. Further, data are generated in routine health-care services, rather than in artificially constructed trials. Lastly, outcomes research might be able to deliver answers to some questions quickly, cheaply and with greater statistical power, and without the need to seek ethical approval and individual patient consent, compared with the time-consuming and costly randomised trial. This review suggests that outcomes research in mental health has indeed realised these advantages — incorporating large numbers of subjects from real-life clinical populations and following them up for clinically meaningful periods of time.
Weaknesses of outcomes research
Ellwood's original vision of outcomes research required that a rich and clinically meaningful set of outcomes would be collected for all patients during their routine care (Reference EllwoodEllwood, 1988). However, the feasibility and cost of such data collection has meant that the building blocks of much outcomes research (with notable exceptions) have been data that are collected as part of the administrative process (Reference IezzoniIezzoni, 1997). These administrative data (produced by federal health providers, state governments and private insurers) contain the minimum amount of information required to fulfil an administrative function, particularly billing. They generally include little more than routine demographic data, ICD-9 diagnostic codes, details of interventions received during a hospital episode, length of stay and mortality during a hospital episode. The fundamental problem with research using these data is that the outcomes available are generally not those that we would like to study. Research becomes driven by the availability of data rather than by the need to answer specific questions, as acknowledged by one outcomes researcher: ‘I utilise data that are available. I do not start with “what is the problem and what is the outcome?” I say, “ given these data, what can I do with them?” ’ (Reference BlumbergBlumberg, 1991).
The other major problem with outcomes research, as with all observational research, is the problem of confounding and selection bias (Reference Cook and CampbellCook & Campbell, 1979; Reference IezzoniIezzoni, 1997). The treatment that a patient receives will often be determined by a number of factors that are related to outcome, such as disease severity. Thus patients will differ in many ways other than the treatment they receive, and it is therefore difficult to attribute any differences in outcome to the treatment itself (Reference Green and ByarGreen & Byar, 1984).
Our review suggests that, in mental health, large-scale studies using ‘humongous databases’ are largely achieved at the expense of clinically meaningful outcomes and limited opportunities to adjust for confounding. Only two studies stand out as having collected a broad range of clinically important outcomes and case-mix variables, reflecting not just disease severity but the facets of service use and health-related quality of life — the MOS (Reference Wells, Stewart and HaysWells et al, 1989) and Rosenheck's study of PTSD (Reference Rosenheck, Fontanna and StolarRosenheck et al, 1999b).
Can outcomes research ever be useful in the UK?
Professor Nick Black has recently called for the establishment of large-scale, high-quality clinical databases across all disciplines in the UK (Reference BlackBlack, 1999). The most ambitious example of this work in the UK has been in the field of intensive care (Reference RowanRowan, 1994). According to Black, such databases need not be seen as an alternative to the randomised trial, but rather as a complement. The attractions for researchers include the possibility of generating large samples from many participating centres, and of including clinically important subgroups of patients who might be excluded from traditional trials. Outcomes research can also be used to promote rather than replace randomised trials in a number of ways. First, raising the level of uncertainty among clinicians as to the effectiveness of established interventions might increase clinicians' likelihood of participating in a randomised trial. Second, it could provide a permanent infrastructure for mounting multi-centre trials. Finally, the adoption of such databases means that research would no longer be the preserve of a minority of clinicians working in specialist centres, thus enhancing the generalisability of the results.
How feasible are such developments in mental health research in the UK?
The absence of a centralised administrative data-collection system in the UK has meant that the building blocks of outcomes research have never developed to the extent that they have in the USA. Initiatives to ensure that uniform outcomes are collected for all patients, such as the Health of the Nation Outcome Scales (Reference Wing and DelamontheWing, 1994), have been proposed but have not so far been adopted in routine practice (Reference Slade, Thornicroft and GloverSlade et al, 1999). Consequently, the adoption of routine outcomes monitoring will entail substantial effort.
Research initiatives are under way; for example, the Centre for Outcomes Research and Effectiveness (CORE) has been established under the auspices of the British Psychological Society (Reference CliffordClifford, 1998) in order to generate ‘practice-based evidence’ of effectiveness framed within routine services (Marginson et al, 2000). At this juncture, it would be timely to learn from the examples of outcomes research in the USA, and to recognise the limitations and potential of the approach.
Rosenheck et al (Reference Rosenheck, Fontanna and Stolar1999b), who provided one of the more rigorous examples of outcomes research, outlined several ingredients of a successful clinical database, capable of producing rigorous and informative research. Outcomes databases should:
-
(a) include large numbers of subjects
-
(b) use standardised instruments that are appropriate for the clinical condition being treated
-
(c) measure outcomes in multiple relevant domains
-
(d) include extensive data in addition to outcomes measures, in order to support matching
-
(e) collect data at standardised intervals after a sentinel event such as entry to or discharge from hospital
-
(f) take aggressive steps to achieve the highest possible follow-up rates.
Data should also be collected prospectively if they are to meet these aims.
Such databases are going to require substantial time, effort and expense to establish, making outcomes research far from the quick and cheap research option that was envisaged. For example, the whole MOS cost US$12 million, and the depression component cost about US$4 million (Reference Wells, Sturm and SherbourneWells et al, 1996). Outcomes research requires resolution of the practical and ethical problems of using clinical data for study purposes, as highlighted in recent debates about the Data Protection Act, the European Human Rights Act and the Health and Social Care Bill (Reference Al-Shahi and WarlowAl-Shahi & Warlow, 2000; Medical Research Council, 2000; Reference AndersonAnderson, 2001; Reference KmietowiczKmietowicz, 2001).
The pharmaceutical industry is especially keen to use outcomes research to examine the effectiveness of its products. This review highlights the fact that, so far, outcomes studies conducted by the pharmaceutical industry have been generally of poor quality and do not adhere to the sensible recommendations outlined by Rosenheck et al (Reference Rosenheck, Fontanna and Stolar1999b). The use of this method has clear advantages for the pharmaceutical industry — particularly in terms of cost. In conducting such research, the industry can claim that expensive (pragmatic) randomised trials are no longer needed in order to examine clinical and economic effectiveness in routine care settings; neither will they have to provide and dispense the drugs for the many thousands of patients who are included in these studies. Informed consent and ethical approval may no longer be required, since treatment is as received, as part of usual care, and outcomes are those that are collected anyway. Large-scale outcomes studies that are currently in progress — such as the SOHO Study — will need to demonstrate that they are methodologically robust and that their results are believable.
Mental health researchers must give clear thought as to how outcomes databases should be constructed, how resources might be put in place, and to what extent informed consent is required for research conducted using these data. Outcomes research should not be seen as an alternative to randomised controlled trials, but rather as a complement. Clinicians do not generally like collected standardised data for each and every patient (Walter et al, Reference Walter, Cleary and Rej1996a, Reference Walter, Kirkby and Marksb; Reference Slade, Thornicroft and GloverSlade et al, 1999). It would be unfortunate if outcomes research was simply to be regarded as a quick and flawed solution to the many political and clinical problems in mental health.
CLINICAL IMPLICATIONS
-
• Robust evidence is needed of the effectiveness of new and existing treatments, interventions and policy initiatives in mental health.
-
• Randomised trials have formed the ‘gold standard’ of this evidence but are subject to many limitations.
-
• Outcomes research has the potential to provide ‘real world’ evidence of clinical and economic effectiveness, relatively quickly and cheaply, using routinely collected data from clinical services.
LIMITATIONS
-
• Outcomes research uses an observational design and is subject to many limitations — principally bias and confounding.
-
• The quality of the data upon which outcomes research is based is often poor.
-
• Successful outcomes research depends upon the routine collection of diverse and clinically meaningful outcomes, which requires substantial effort and cost.
APPENDIX
Search terms
The following bibliographic databases were searched: Medline (1966-2000); EMBASE (1981-2000); Cinahl (1982-2000); PsycLit (to 2000). In addition, we hand-searched a number of key journals and scrutinised reference lists for additional studies; we contacted key authors in the field. Our search included the following terms.
“HEALTH STATUS-INDICATORS”; “ OUTCOME-AND-PROCESS-ASSESSMENT-(HEALTH-CARE)” / all subheadings; “ OUTCOME-ASSESSMENT-(HEALTH-CARE)” /all subheadings; (OUTCOME MEASURE*) in ti, ab; (HEALTH OUTCOME*) in ti, ab; (QUALITY OF LIFE) in ti, ab; MEASURE* in ti, ab; ASSESS* in ti, ab; (SCORE* or SCORING) in ti, ab; INDEX in ti, ab; “ OUTCOMES-RESEARCH” /all subheadings; HEALTH OUTCOME* in ti, ab; SCALE* in ti, ab; MONITOR* in ti, ab; ASSESS* in ti, ab; OUTCOME* in ti-ab; explode “ TREATMENT-OUTCOMES”; explode “ PSYCHOLOGICAL-ASSESSMENT”; “QUALITY-OF-LIFE”; (OUTCOME* or PROCESS*) near3 ASSESSMENT*; HEALTH STATUS INDICATOR*; HEALTH STATUS; HEALTH OUTCOME* in ti, ab; QUALITY OF LIFE in ti, ab
Acknowledgements
The authors are grateful to Kate Misso for performing all literature searches.
eLetters
No eLetters have been published for this article.