Van den Brink et al (Reference van den Brink, Ormel and van der Meer2001) studied general practitioners' (GPs') prognostic predictions for depression and general anxiety. They found the prognosis was in general more pessimistic than the observed course and failed to attain maximal performance in comparison with a statistical model based on baseline variables. I would like to express three concerns about the technical details of this article.
First, the kappas they report are Cohen's kappas whereby the disagreement between “full recovery within 6 months” and “partial recovery” is penalised equally to disagreement between “full recovery within 6 months” and “no recovery”. Clinically, however, the former is apparently less grave than the latter. More appropriate statistics would be weighted kappas, which are 0.31 (95% CI 0.15-0.46) for GP prognosis for depression, 0.35 (95% CI 0.16-0.54) for GP prognosis for anxiety, 0.56 (95% CI 0.43-0.70) for model prognosis for depression and 0.51 (95% CI 0.33-0.69) for model prognosis for anxiety. These figures are appreciably larger than those originally reported.
Moreover, regardless of whether we use Cohen's kappas or weighted kappas, the authors did not examine whether the GP prediction is indeed statistically significantly worse than the model's. The reported 95% confidence intervals overlap, and we do not know whether the clinicians are actually performing worse than the maximally attainable model.
Third, as the authors rightly note in the Discussion, their way of using the total sample to construct a predictive model may have ‘overfitted’ the model to the data and produced artificially inflated agreement. A more ideal way may have been the ‘leaving-one-out method’ (Reference LachenbruchLachenbruch, 1975), in which analysts would repeatedly build a model based on a sample minus one observation and examine whether each model could predict the one excluded observation.
In this connection it may be worthwhile to point out that the comparison between human performance and that of a statistical model is a theme repeatedly found in clinical psychology (Reference MeehlMeehl, 1954; Reference GoldbergGoldberg, 1970). These studies conclude that, because of the inevitable random error in human judgement, the latter almost always outperforms the former. It will, therefore, be most interesting to see how, in the authors' next round of proposed investigation, clinicians can improve their performance if they are given feedback on prognostic factors.
eLetters
No eLetters have been published for this article.