In the light of rising health care expenditure and constrained budgets, economic considerations are increasingly playing a role in health care decision-making. For example, economic evaluations of alternative pharmaceutical technologies are being used to inform formulary, government and other health care resource allocation decisions (Task Force on Principles for Economic Analysis of Health Care Technology, 1995; Reference DrummondDrummond, 1998). Allocation of scarce resources among competing ends is a central focus of economics. At its core is the idea that individuals' behaviour — how people react to different prices, choices, constraints, incentives and trade-offs — can affect economic outcomes. Thus, studies of economic outcomes within the context of individuals' behaviour in real-world settings are required when comparing the value of alternative health care interventions. Because of the importance of these behavioural aspects, economic evaluations conducted within the context of randomised clinical trials may not, on their own, be sufficient to inform decision-making. This is because individuals' behaviour in randomised clinical trials is often controlled through a strict, protocol-driven environment. Put another way, the high degree of internal validity required of randomised clinical trials to establish safety and efficacy for regulatory purposes may reduce the external validity or generalisability of these studies for making economic decisions.
Numerous observational studies have been conducted of the economic outcomes associated with alternative antidepressants (Reference Hylan, Crown and MeneadesHylan et al, 1998a ), and include both prospective and retrospective studies. Prospective, naturalistic economic clinical trials have been proposed as a study design that marries features of a clinical trial (i.e. the randomisation) with features of clinical practice (i.e. the observation of usual care) (Reference Simon, Wagner and Von KorffSimon et al, 1995). Yet, because of inclusion criteria and other considerations, these studies may still not be generalisable to broader populations. Retrospective studies using large administrative databases offer quick access to large samples of patients in naturalistic settings. Although the ability to observe patients in naturalistic settings improves the generalisability of study findings, a variety of confounding factors may introduce sources of bias in the estimated treatment effect. To this end, a number of economic analyses of antidepressants have used methods designed to mitigate such bias (Reference Hylan, Crown and MeneadesHylan et al, 1998a ).
ECONOMIC EVALUATIONS OF ANTIDEPRESSANTS
At least five different study designs have been used for economic evaluations of anti-depressants: randomised clinical trials, meta-analyses, decision-analytic models (which often use data from clinical trials), retrospective database studies, and prospective, naturalistic economic trials (Reference Hylan, Crown and MeneadesHylan et al, 1998a ). Randomised clinical trials, meta-analyses and decision-analytic models, which rely upon data from controlled trials, may not capture the economic outcomes of different antidepressants as they are used in actual clinical practice. The benefits and considerations of these methods as they have been applied to economic studies of antidepressants and a review of the studies are described more fully elsewhere (Reference Hylan, Crown and MeneadesHylan et al, 1998a ). To summarise, controlled trials have a high degree of internal validity; meta-analyses and decision-analytic models using data from clinical trials can extend health economic analyses not possible with only a single trial. Clinical trials attempt to hold constant the behavioural and health-system effects in order to evaluate the safety and efficacy of a particular treatment against one or more comparators. Yet it is the behaviour of patients and providers interacting with the characteristics of a drug technology that ultimately leads to variability in clinical outcomes and expenditures between treatments in clinical practice (Reference Simon, Von Korff and HeiligensteinSimon et al, 1996). Study participation criteria and other design characteristics of clinical trials can limit the external validity or generalisability of the outcomes observed in controlled clinical trials. For example, it is well known that the use of antidepressants in clinical practice is very different from that observed in controlled trials (Reference Donoghue, Tylee and WildgustDonoghueet al, 1996; Reference DemyttenaereDemyttenaere, 1998; Reference Wilde and BenfieldWilde & Benfield, 1998). Thus, real-world efficacy and, in turn, economic outcomes are unlikely to be the same as those projected by controlled trials.
NATURALISTIC ECONOMIC OUTCOME STUDIES OF ANTIDEPRESSANTS
Prospective studies
Two important contributions of naturalistic studies using data collected from clinical practice are the identification of associations between antidepressant use and economic outcomes, and the analysis of these outcomes in the context of observed patient and provider behaviour. To date, only one prospective randomised, naturalistic economic clinical trial has compared alternative antidepressant therapies (Simon et al, Reference Simon, Von Korff and Heiligenstein1996, Reference Simon, Heiligentein and Revicki1999). This study compared 536 patients randomised initially to receive either fluoxetine 20 mg (n=173) or a tricyclic antidepressant (TCA): desipramine (n=181) or imipramine (n=182). After randomisation, treatment was provided at the clinicians' discretion; that is, treatment was naturalistic. The study found that 6-month clinical efficacy measures (depressive symptom scores) and total direct expenditures were not statistically different between patients who began therapy on fluoxetine, desipramine or imipramine.
Hybrid naturalistic trials like this one have garnered attention as a potential solution to the dilemma of maximising both external and internal validity in pharmaco-economic analysis. However, they have limitations and therefore should be viewed as complementary to, rather than replacements for, the data that comes from randomised clinical trials, meta-analyses, decision-analytic models and retrospective studies.
Retrospective studies
In contrast to the scarcity of prospective studies on the treatment of depression, a large number of retrospective studies have been conducted. These studies often use large administrative databases such as insurance claims or other electronic records of patients' resource utilisation. When total health care expenditures are considered, the various retrospective database studies are remarkably consistent. In particular, virtually all studies have found that total direct health care treatment expenditures for patients starting therapy on selective serotonin reuptake inhibitors (SSRIs) are equal to or lower than those for patients who start therapy on TCAs (Sclar et al, Reference Sclar, Robison and Skaer1994, Reference Sclar, Robison and Skaer1995; Skaer et al, Reference Skaer, Sclar and Robison1995, Reference Skaer, Sclar and Robison1996; Reference Forder, Kavanagh and FenyoForder et al, 1996; Reference Croghan, Lair and EngelhartCroghan et al, 1997; Reference Melton, Kirkwood and FarrarMelton et al, 1997; Reference Obenchain, Melfi and CroghanObenchain et al, 1997; Reference Revicki, Palmer and PhillipsRevicki et al, 1997; Crown et al, Reference Crown, Hylan and Meneades1998a ,Reference Crown, Obenchain and Englehart b ; Reference Hylan, Buesching and TollefsonHylan et al, 1998b ; Reference Simon and FishmanSimon & Fishman, 1998; Reference McCombs, Nichol and StimmelMcCombs et al, 1999; Reference Treglia, Neslusan and DunnTreglia et al, 1999). These findings demonstrate that the higher drug acquisition costs for SSRIs relative to TCAs are at least offset, and in some cases more than offset, by lower expenditures for health care services other than antidepressant pharmacotherapy.
In contrast to the uniformity of conclusions just described, the concern is often raised that database studies arrive at contradictory conclusions and that these disparities appear to be related to the sources of funding for the studies. These differences are more apparent than real and arise from the fact that the studies often evaluate different measures of health care expenditure (e.g. mental health expenditure or antide-pressant expenditure). When viewed from the perspective of total direct health care expenditure, the vast majority of studies support a common conclusion — that differences in antidepressant acquisition costs are at least offset, and in some cases more than offset, by savings in other areas. This finding is consistent across comparisons of TCAs and SSRIs and between the SSRIs (Reference Russell, Berndt and MiceliRussell et al, 1999).
Total direct health care expenditure can provide a more comprehensive assessment of economic outcome than narrower measures. Studies that focus on drug costs alone are less useful because the results of these studies are driven by the acquisition costs, which may not capture the full consequences of the initial drug selection. In the absence of a link between treatment costs and clinical outcome, inexpensive but ineffective drugs, with serious side-effect profiles that cause early discontinuation of therapy, may appear to be less costly than more expensive medications that are used more effectively. Studies that look just at antidepressant prescription volume and expenditure (Reference Smith and SherrillSmith & Sherrill, 1996; Reference Singletary, North and WeissSingletary et al, 1997; Reference VialeViale, 1998) find differences between the antidepressants that are highly correlated with their unit cost. However, such conclusions may be incorrect if the patterns of antidepressant use lead to differences in broader health care resource utilisation such as concomitant pharmaceutical prescribing, physician visits or hospitalisations.
Studies that look only at depression-related or mental health care expenditure suffer from the same general criticism, although to a lesser degree. For this reason, guideline panels have recommended that pharmaco-economic studies should use the broadest measures of expenditure available (Task Force on Principles for Economic Analysis of Health Care Technology, 1995). Although some payers may face only pharmaceutical costs or mental health care treatment costs, it is still important to consider the impact of initial treatment selection on total direct health care expenditure. Treatments that appear to result in lower antidepressant costs initially may raise expenditure in other parts of the health care system. This could raise health care expenditure overall and have unintended consequences. For example, attempts by a health plan to minimise expenditure on antidepressant therapy by prescribing TCAs as first-line therapy may actually increase expenditure if patients experience higher rates of depression relapse as a result. Considering the total health care expenditure associated with initial treatment selection is also consistent with providing care to the greatest number of recipients for a given budget, an objective for many health care systems.
STATISTICAL METHODS FOR REDUCING THE EFFECTS OF SELECTION BIAS IN RETROSPECTIVE STUDIES
Retrospective studies that evaluate the economic outcomes of alternative treatments have been widely criticised because of their failure to control for the selection bias effects of unobserved variables that might correlate with both treatment selection and outcomes (Reference AndersonAnderson, 1994; Reference Crown, Obenchain and EnglehartCrown et al, 1998b ). In the case of antidepressants, the patient's medical history (e.g. previous treatment response and underlying disease severity) and physician characteristics (e.g. prescribing preferences and prior experience with different antidepressants) may influence both the choice of initial antidepressant and the subsequent outcome (prescribed dose, resource utilisation, etc.). If these factors are unobserved and are also significant determinants of outcome, failure to account for them can result in biased estimates of the impact of drug treatment on outcome. This is, indeed, a serious issue because antidepressant selection bias can result in erroneous inferences about the magnitude and statistical significance of treatment effects. Randomised controlled trials deal with this problem of sample selection bias by evenly distributing the effects of unobserved factors among treatment arms.
Statistical methods can be applied to retrospective data to help control for both observed factors as well as unobserved factors correlated with initial treatment selection and subsequent outcomes. These methods use the data to construct variables that can be used to control for the effects of unobserved factors such as underlying disease severity or physician prescribing patterns. Using observable data to construct variables to act as proxies for unobserved factors is not new in the area of outcomes research (Reference Von Korff, Wagner and SaundersVon Korff et al, 1992; Reference Johnson, Hornbrook and NicholsJohnson et al, 1994). Three broad statistical modelling approaches are considered here: instrumental variables, parametric selection bias methods and propensity score models.
Instrumental variables
The instrumental variables technique is widely used by econometricians to correct for a variety of statistical problems in regression analysis — most notably, simultaneous equations bias and errors in measurement (e.g. Reference KennedyKennedy, 1992; Reference GreeneGreene, 1993). All such problems have the characteristic that the explanatory variables are correlated with the error terms of the estimated equations. An instrumental variable is one that has the characteristic of being highly correlated with the variable for which it is intended to serve as an instrument without its being correlated with the error terms. Non-random selection into treatment groups essentially results in a problem of missing variables measurement error in the statistical model. The effect of unobserved variables that are important for explaining drug selection will be captured by the error term of the drug selection equation. If, as a result, the error term of the drug selection equation is correlated with treatment outcomes, estimates of the treatment effect will be biased. A major difficulty with the implementation of instrumental variables techniques is the challenge of finding variables that are highly correlated with the variable of interest (e.g. drug selection) but uncorrelated with the outcome variable (e.g. treatment cost).
To date, the instrumental variables approach has not been used in any published study of depression. However, in landmark studies McClellan and colleagues (Reference McClellan, McNeil and NewhouseMcClellan et al, 1994; Reference McClellanMcClellan, 1995) demonstrated the application of instrumental variables to mortality outcomes for elderly patients with acute myocardial infarction. These studies used differences in distance from treatment centres as an instrumental variable to control for the confounding effects of unobserved case-mix variation. The successful application of parametric selection bias models in depression studies suggests that instrumental variables techniques will soon find their way into the depression literature.
Parametric selection bias methods
Parametric sample selection models are closely related to the instrumental variables approach. Originally developed in the econometrics literature to assess labour market outcomes and the effects of job and education training programmes (Heckman, Reference Heckman1976, Reference Heckman1979; Reference Heckman and SmithHeckman & Smith, 1995), these models have been increasingly applied to economic evaluations of health-care utilisations (e.g. Reference Dowd, Feldman and MoscoviceDowd et al, 1996; Hylan et al, Reference Hylan, Neslusan and Baldridge1997, Reference Hylan, Buesching and Tollefson1998b ; Crown et al, Reference Crown, Hylan and Meneades1998a ,Reference Crown, Obenchain and Englehart b ). Sample selection models use a two-stage econometric approach to construct a variable that controls for the bias due to unobserved factors associated with treatment selections.
The estimation of sample selection models proceeds in two stages. In the first stage, a model of treatment selection is estimated. From this model, the errors in correctly predicting treatment selection are used to construct an adjustment factor, λ, calculated for each patient. In the second stage, the adjustment factor is included as one of the explanatory variables in the outcome model.
Including λ in the outcome (e.g. expenditure) equation helps control for underlying differences across the patient group in their probability of receiving the selected antidepressant. A feature of sample selection models is that the adjustment factor permits a direct test of whether selection bias is present and if so, what the direction of its impact is. Specifically, if the coefficient on the adjustment factor, λ, in the outcome equation is statistically significant, this indicates that selection bias is present and that the results of the treatment effect would have been biased had the adjustment not been made. The sign on the adjustment factor also indicates the direction in which the results would have been biased.
As with the instrumental variables approach, however, the estimation of parametric selection bias models requires the identification of variables that are correlated with treatment selection but uncorrelated with outcomes. Recently, analysts have proposed several such variables that seem to work reasonably well for sample selection models. For example, the time between the launch date for a particular pharmaceutical product and the date of the prescription (as a proxy for the diffusion of information about the product to physicians) seems to be a good predictor of treatment choice, but is uncorrelated with treatment costs.
Propensity score models
Propensity score analysis has received growing attention as a methodology for reducing the bias due to unobserved differences in treatment groups (Reference Rosenbaum and RubinRosenbaum & Rubin, 1984; Reference Robins, Mark and NeweyRobins et al, 1992; Reference Drake and FisherDrake & Fisher, 1995). As with sample selection models, the method of propensity scores involves first estimating the conditional probability of a treatment outcome. Patients are then sorted into groups with similar probabilities, or propensity scores. Finally, treatment effects are evaluated within each of the patient sub-populations with similar propensity scores. The propensity score approach attempts to deal with the effects of unobserved variables by matching recipients and non-recipients who have similar predicted probabilities of receiving treatment based on observed variables. Typically, this matching is done for several groups (for example, those with low predicted probabilities, mid-range predicted probabilities and high predicted probabilities). The effect of treatment on outcomes is assessed for each of these subsamples and the results are then combined to determine the overall effect of treatment on the outcome of interest.
The propensity score approach avoids the necessity of specifying variables that are correlated with treatment selection but uncorrelated with outcomes, which is the fundamental challenge of the instrumental variables and sample selection models. However, it does not provide a direct test of the presence of selection bias, nor does it provide an estimate of the magnitude of selection bias if it is present. None the less, if a test for selection bias is not required, Angrist (Reference Angrist1997) has argued that the propensity score process of matching patients with similar probabilities of receiving particular treatment accomplishes much the same thing as sample selection models. Obenchain & Melfi (Reference Obenchain and Melfi1998) compared the propensity score method to the parametric sample selection approach using as an application differences in total health care expenditure for patients treated with a TCA or fluoxetine. Although the two methods resulted in similar conclusions about the economic differences between the two drugs, Obenchain & Melfi concluded that the propensity score method was the preferred approach because it is easier to understand and explain, and less sensitive to underlying model assumptions.
Application of these methods
Of the three methods discussed above, the parametric sample selection approach has been most widely applied in depression studies. A number of studies have applied the two-state sample selection method to assess differences in total health care expenditure between TCAs and SSRIs (Reference Croghan, Lair and EngelhartCroghan et al, 1997; Crown et al, Reference Crown, Hylan and Meneades1998a ,Reference Crown, Obenchain and Englehart b ). These studies consistently find that differences in antidepressant acquisition costs are offset (and in some cases more than offset) by broader measures of health care resource utilisation.
Other studies have extended the two-stage sample selection model to look at additional outcomes, including differences in the number of benzodiazepine co-prescriptions between different SSRIs (Reference Hylan, Neslusan and BaldridgeHylan et al, 1997; Reference Treglia, Neslusan and DunnTreglia et al, 1998). These studies find that after controlling for unobserved factors that may be correlated with initial SSRI selection, patients who start therapy on paroxetine have a higher rate of benzodiazepine prescriptions than patients who start therapy on fluoxetine. Of course, the higher co-prescribing of benzodiazepines with paroxetine could be the result of treatment for comorbid anxiety disorder. Paroxetine patients might be more likely to have comorbid anxiety and depression because paroxetine is indicated for both conditions. In reality, statistical approaches may never be able to control fully for such biases. However, controlling for the non-random selection into initial treatment is particularly important in depression studies, because it may be that selection of a particular drug may be influenced by marketing initiatives, previous treatment non-response and comorbid conditions (such as anxiety).
Corroborating prospective studies are useful to confirm the findings of retrospective studies. It is interesting to note the similarity in economic outcomes between fluoxetine and TCAs found in the prospective study of Simonet al (Reference Simon, Von Korff and Heiligenstein1996) and the retrospective study of Croghan et al (Reference Croghan, Lair and Engelhart1997), which used statistical methods to control for potential biases due to non-randomisation. Both of these studies found that the total direct health care expenditure for fluoxetine equalled that for the TCAs; patients who began therapy on fluoxetine, however, were more likely to have a dose and duration of therapy consistent with recommended treatment guidelines. Similar prospective studies comparing the SSRIs and other new antidepressants are necessary to provide further corroboration of the findings from retrospective databases.
CONCLUSION
Because the external validity of randomised clinical trials is limited, observational studies using data from clinical practice are useful to complete our understanding of the economic outcomes of alternative treatments. Like all study designs, prospective and retrospective studies have their own strengths and limitations. No single study or design can hope to provide definitive results. Rather, it is desirable to assess economic outcomes across a broad range of study designs as well as health care settings. Findings that are consistent across a number of study designs and environments may be regarded with greater confidence.
Retrospective database studies have been commonly used to assess the economic outcomes of alternative antidepressants in clinical practice. Because of the potential selection bias inherent in non-randomised samples, it is important to control for the potential presence of unobserved factors that may be correlated with the initial treatment selection and economic outcomes. Because the presence or absence of selection bias is ultimately an empirical issue, it is important to test for it in retrospective studies. We have yet to realise the full application of existing statistical methods to retrospective studies in health care technology evaluation, particularly as applied to antidepressants. The methods discussed in this review may also be applicable to evaluating different antipsychotic agents in the treatment of schizophrenia.
Retrospective studies comparing the economic outcomes of alternative antidepressants have consistently found that differences in drug acquisition costs appear to be offset — and in some cases more than offset — by differences in broader measures of health care resource utilisation. This suggests that antidepressant acquisition costs are not a good predictor of total direct health care expenditure. As a consequence, decisions based on antidepressant acquisition costs alone may result in unintended clinical and economic outcomes. It is necessary for health care decision-makers to take a broader perspective when making decisions about paying for depression treatment. This broader budgetary perspective is consistent with an objective of providing the greatest benefit for a given population and budget, an approach that maximises the value of health care spending.
eLetters
No eLetters have been published for this article.