Four studies have reported the reliability of diagnosis of bulimia nervosa over time. Interviews conducted within the same month have fair agreement (κ=0.42) (Reference Bushnell, Wells and HornblowBushnell et al, 1990). A 10-year follow-up also found moderate agreement for some behaviours but not others : binge eating (κ=0.47); self-induced vomiting (κ=0.49); laxative misuse (κ=0.50); diet pills (κ=0.45); and fasting (κ=0.25) (Reference Field, Colditz and HerzogField et al, 1996). Use of a fuller assessment on at least one occasion seems to promote moderate agreement (κ=0.59) (Reference Wade, Tiggemann and MartinWade et al, 1997). Reliability of a more broadly defined phenotype of bulimia nervosa may produce lower agreement (κ=0.28) (Reference Bulik, Sullivan and KendlerBulik et al, 1998). This study was designed to further explore predictors of reliability of the lifetime diagnosis of bulimia nervosa in comparison with predictors of reliability of a lifetime diagnosis of major depression, which were assessed with the same diagnostic instrument in a large population-based sample of female twins.
METHOD
Participants
The data for this report are from a population-based longitudinal study of Caucasian female twins drawn from the Virginia Twin Registry (VTR). The VTR was formed from a systematic review of all birth records in the Commonwealth of Virginia (USA) after 1918. Twins were eligible to participate if they were born between 1934 and 1971 and both members had previously responded to a mailed questionnaire completed over 1987-88 (individual response rate of 64%). Data used in the present study are from the first interview wave and accompanying self-report personality measures, and the third interview wave, which will be called Time 1 interview and Time 2 interview, respectively. At Time 1 (1987-89), 92% of the eligible individuals (n=2163) were interviewed (90% face to face and the remainder by telephone). The mean age of the twins was 29.3 years (s.d.=7.7, range 17-54 years). Time 2 (1991-93) occurred on average 5.1 years (s.d.=0.4) later. Written informed consent was obtained prior to face-to-face interviews and verbal assent prior to telephone interviews.
Measures
Interviews were conducted blind to information about co-twins. Information about interviewer characteristics has been presented elsewhere (Reference Kendler, Macclean and NealeKendler et al, 1991). A narrow definition of lifetime bulimia nervosa, or one that conformed strictly to DSM-III-R (American Psychiatric Association, 1987) criteria, was used. In addition, in order to maximise statistical power in the study of a low prevalence disorder, a broad definition of lifetime bulimia nervosa was adopted where the DSM-III-R 'D' criterion was omitted because there appear to be few meaningful differences between women who binge and use associated weight-loss methods twice a week and those who engage in such behaviours less than twice a week (Reference Garfinkel, Lin and GoeringGarfinkel et al, 1995; Reference Sullivan, Bulik and KendlerSullivan et al, 1998). This broad category differs slightly from its previous use (Reference Bulik, Sullivan and KendlerBulik et al, 1998), in that it includes women with a wider range of concern about their body shape and weight, from “a lot more concerned than most women your age” to “a little bit more concerned”.
At the first interview there was one probe question (“Have you ever in your life had eating binges during which you ate a lot of food in a short period of time?”). If this was answered negatively, no further questions were asked. At the second interview, a further probe question was asked, relating to weight loss behaviours (“Have you ever made yourself throw up as a means of controlling your shape and weight?”). If these were both answered negatively, no further questions were asked.
The diagnosis of DSM-III-R major depression was made using questions from the Structured Clinical Interview for DSM-III-R (SCID) (Reference Spitzer, Williams and GibbonSpitzer et al, 1992). Numerous probe questions were used to ascertain the presence of depressive symptoms. Initially, occurrence of major depression over the past year was assessed, using a probe question for each one of the diagnostic criteria. Then major depression over the lifetime (excluding the past year) was assessed with two probe questions.
A description of the variables examined for predictive value of reliability is provided in Table 1 : all predictor variables are from the Time 1 interview period unless otherwise noted.
Variable type | Description of variable |
---|---|
Demographics | Years of education (highest grade of school or year of college completed), annual salary, parental education, urbanisation |
Population size of area lived in at time of interview | |
Annual salary | |
Parental education | |
Age | Age at first interview |
Age of developing bulimia nervosa recalled at both Time 1 and Time 2 | |
Current symptomatology | 20 items of the Padua Inventory (Reference SanavioSanavio, 1988) assessing levels of obsessive and compulsive symptomatology |
Total score of the Symptom Check-List (Reference DerogatisDerogatis, 1975), assessing depression, panic and agoraphobia, somatisation and anxiety, and sleep difficulties (Reference Kendler, Walters and TruettKendler et al, 1994) from the previous 30 days | |
Lifetime comorbidity | Structured Clinical Interview (Reference Spitzer, Williams and GibbonSpitzer et al, 1992) for : |
DSM-III-R alcohol dependence and panic disorder | |
DSM-III phobias | |
DSM-III-R symptoms and DSM-III-duration generalised anxiety disorder | |
Personality and attitudes | Extroversion and neuroticism (Reference Eysenck and EysenckEysenck & Eysenck, 1975) |
Altruism (7 items from the Interpersonal Reactivity Index; Reference DavisDavis, 1980) | |
Interpersonal dependency (Reference HirschfeldHirschfeld et al, 1977) | |
Mastery (reversed coding of the powerlessness sub-scale of the Alienation subtest; Reference Maddi, Kobasa and HooverMaddi et al, 1979) | |
Dispositional optimism (Reference Scheier and CarnerScheier & Carner, 1985) | |
Self-esteem (Reference RosenbergRosenberg, 1965) | |
Locus of control (resourcefulness sub-scale of the Attributional Style Questionnaire; Reference Peterson, Semmel and vonVaeyerPeterson et al, 1982) |
Statistical analyses
Agreement between Time 1 and Time 2 diagnoses were examined using the kappa coefficient (k), tetrachoric correlations, and the Yule's Y statistic. Yule's Y is less dependent on the base rate than k, which permits a more direct comparison between the higher prevalence major depression and the lower prevalence bulima nervosa. We also calculated sensited - the proportion of true cases correctly identified (risk for false-negatives) - and specificity - the proportion of true non-cases correctly identified (risk for false-positives). For the purpose of these calculations, the Time 2 assessment was chosen as the standard, as it contained more probe questions than the Time 1 assessment. One would expect sensitivity to be lower for more prevalent disorders and specficity to be higher for less prevalent disorders.
The ability of variables to predict reliability, sensitivity and specificity was then examined using logistic regression. Results are presented as odds ratios with 95% CIs. As twin pair observations are correlated, the assumption of independent sampling is violated, and we therefore used generalised estimating equation (GEE) modelling (Reference Zeger, Liang and AlbertZeger et al, 1988) to adjust standard errors for non-independent observations using the GENMOD procedure.
Finally, separate stepwise logistic regressions were used to examine the relative importance of the significant predictors for reliability, sensitivity and specificity of reporting bulimia nervosa. All analyses were carried out with SAS version 7.0 (SAS Institute, 1996).
RESULTS
Agreement between interviews
For the purposes of this study, women who reported lifetime bulimia nervosa at Time 2 but not Time 1, and who reported age of onset as being after Time 1, were considered to have developed bulimia nervosa between the two assessments. These women were removed from further analysis so that onsets that occurred between Time 1 and Time 2 would not be confounded with unreliable recall. This resulted in two women being removed when considering narrowly defined bulimia nervosa and 11 women being removed when considering broadly defined bulimia nervosa. Reliability of major depression has previously been considered with complete twin pairs only (Reference Foley, Neale and KendlerFoley et al, 1998). As completeness of twin pairs was irrelevant to our analyses, we considred data from all twins, thus increasing the number of women studied and producing a slightly higher k value than previously reported. When onset of first-episode major depression between Times 1 and 2 was considered, 113 woman were removed from further analysis. Taking into account the lower base rate dependent measures (tetrachoric correlations and Yule's Y), narrowly defined bulimia nervosa is the most reliable diagnosis, and the reliability of broadly defined bulimia nervosa and major depression are similar (Table 2). The bulimia nervosa diagnoses have the lowest risk for assigning false-positive cases, but the highest risk for assigning false-negative cases.
Time 1 assessment | Time 2 assessment | |
---|---|---|
Bulimia nervosa | No bulimia nervosa | |
Narrowly defined bulimia nervosa 1 | ||
Bulimia nervosa | 11 | 31 |
No bulimia nervosa | 8 | 1845 |
Broadly defined bulimia nervosa 2 | ||
Bulimia nervosa | 33 | 49 |
No bulimia nervosa | 59 | 1745 |
Major depression | No major depression | |
Major depression 3 | ||
Major depression | 387 | 277 |
No major depression | 130 | 988 |
Clinical features predicting reliability of bulimia nervosa
By far the largest category of women with unreliably reported bulimia nervosa included women who met the full criteria at one interview and gave negative replies to the probe question/s at the other interview - for narrowly defined bulimia nervosa this occurred approximately one-third of the time, for broadly defined bulimia nervosa it occurred approximately half the time. Reported use of self-induced vomiting or laxative misuse at either interview significantly predicted reliability (P=0.005, odds ratio=3.48,95% CI 1.45-8.35). The likelihood of reporting the behaviour associated with bulimia nervosa at Time 2 was dependent on the type of behavior reported at Time 1 (see Table 3). The most memorable weight loss behaviour was self-induced vomiting (with the odds of reporting vomiting at the second interview 34 times higher if vomiting was reported at the first interview) and laxative miscue (with the odds of reporting laxative abuse at the second interview 28 times higher if laxative abuse was reported at first interivew). In contrast, odds of recalling strict dieting or fasting at Time 2 were only about twice as high if such behaviour was reported at Time 1. Binge eating was less likely to be recalled than either self-induced vomiting or laxatives, but more likely to be remembered than other weight loss behaviours.
Weight loss behaviour | k | Odds ratio (95% C1) | P |
---|---|---|---|
Self-induced vomiting | 0.56 | 34.11 (15.74-73.94) | <0.0001 |
Laxative misuse | 0.50 | 28.74 (9.41-87.81) | <0.0001 |
Binge eating | 0.37 | 6.13 (4.84-7.78) | <0.0001 |
Exercise | 0.23 | 3.61 (1.77-7.38) | 0.0003 |
Strict dieting | 0.15 | 2.29 (1.12-4.67) | 0.02 |
Fasting | 0.13 | 2.41 (0.97-5.98) | 0.05 |
The more detailed Time 2 data were used to investigate any differences in frequency of eating disorder behaviours between those women with reliably reported bulimia nervosa and those women with unreliably reported bulimia nervosa. Results are summarised in Table 4. The strongest association exists between reliability and frequency of binge eating. For both narrowly defined and broadly defined bulimia nervosa, a higher monthly frequency of binge eating predicted more reliable reporting.
Symptom | Narrow-defined bulimia nervosa | Broadly-defined bulimia nervosa | |||
---|---|---|---|---|---|
Mean reliable | Mean unreliable | Mean reliable | Mean unreliable | Odds ratio (95% C1) | |
Frequency of binges each month | 26.73 | 9.45 | 12.21 | 5.64 | 1.07 (1.02-1.12) ** |
Duration of binge eating (in weeks) | 248.40 | 202.90 | 215.46 | 195.21 | 1.00 (0.99-1.01) |
Frequency of vomiting each month | 59.78 | 20.60 | 39.28 | 7.67 | 1.04 (0.97-1.12) |
Duration of vomiting (in weeks) | 145.78 | 58.73 | 91.67 | 32.04 | 1.01 (0.99-1.03) |
Predictors of reliability, sensitivity and specificity of bulimia nervosa and major depression
For the remaining analyses, there was insufficient power to calculate the odds ratio for narrowly defined bulimia nervosa. Therefore, only results for broadly defined bulimia nervosa and major depression are reported here.
Reliability
For bulimia nervosa, more years of education, parental education and decreased likelihood of lifetime major depression were significantly associated with more reliable reporting (data not shown). The women with reliably reported major depression were significantly older than the women with unreliably reported major depression, had higher levels of obsessiveness, general anxiety and depression, and were more likely to experience lifetime generalised anxiety disorder (GAD), panic disorder and simple phobias. There was also considerable influence of personality on the reliability of major depression reporting, where women who reliably reported major depression were significantly more dependent, experienced less mastery, were less optimistic, had lower self-esteem and were more neurotic. In other words, this group appeared to be generally more impaired.
Sensitivity
Increased ability to detect true cases of bulimia nervosa was predicted by more years of parental education and lower levels of altruism. However, because of the lack of convergence occurring in the logistic regression and the consequent inability to produce odds ratios, not all variables could be satisfactorily examined. Increased sensitivity of major depression was predicted by a lower financial status, higher levels of obsessive symptomatology and neuroticism, increased risk for lifetime comorbidity, especially GAD, and lower levels of mastery and optimism.
Specificity
Increased ability to correctly identify true non-cases of bulimia nervosa was predicted by lower levels of current symptomatology, decreased risk for lifetime comorbidity, higher levels of mastery and self-esteem and lower neuroticism. Increased specificity of major depression was predicted by a higher financial status, lower levels of current symptomatology, decreased risk of lifetime comorbidity, lower levels of altruism, dependency and neuroticism and greater optimism.
Multivariate contribution of predictor variables to reliability, sensitivity and specificity
The relative contributions of those predictor variables shown to significantly predict reliability of reporting of broadly defined bulimia nervosa were examined in a stepwise regression model, including reported use of either self-induced vomiting or laxatives, frequency of binge eating, years of education, educational status of parents and presence of lifetime major depression. The variables retained in the equation that predicted more reliable reporting of bulimia nervosa were decreased likelihood of lifetime major depression at either Time 1 or Time 2 (X 2=5.18, P=0.02), use of either self-induced vomiting or laxatives (X 2=4.84, P=0.03), and greater frequency of binges each month (X 2=4.28, P=0.04). Predictors of greater reliability of major depression reporting (including only those significant predictor variables) included greater likelihood of GAD (X 2=23.17, P <0.0001), a higher score on the Symptom Check-List (Reference DerogatisDerogatis, 1975) at Time 2 (X 2=7.28, P=0.007), and increased obsessionality (X 2=4.83, P=0.03).
Due to the low predictive power of the sensitivity measure, this was not examined in a multiple regression for bulimia nervosa. Of those variables that significantly predicted greater sensitivity for major depression in the univariate analyses, two were retained in the final equation, including greater likelihood of lifetime GAD (X 2=28.92, P <0.0001) and lower financial status (X 2=7.03, P=0.008). Of those variables that significantly predicted greater specificity of bulimia nervosa in the univariate analyses, the following were retained in the final equation : decreased likelihood of lifetime major depression (X 2=10.37, P=0.001) and panic disorder (X 2=5.88, P=0.02), and increased levels of mastery (X 2=6.64, P=0.01). Correspondingly, variables that best predicted major depression specificity were a lower likelihood of lifetime GAD (X 2=92.22, P <0.0001) and alcohol dependency (X 2=16.91, P <0.0001) and lower levels of altruism (X 2=5.37, P=0.02).
DISCUSSION
From previous literature, using a base rate sensitive measure (k), bulimia nervosa would appear to be a less reliable diagnosis than major depression, usually showing low to modest agreement between assessments (Reference Bushnell, Wells and HornblowBushnell et al, 1990; Reference Field, Colditz and HerzogField et al, 1996; Reference Bulik, Sullivan and KendlerBulik et al, 1998). We replicate the finding that base rate sensitive measures (i.e. k) show bulimia nervosa to be less reliably diagnosed than major depression.
However, given the much greater prevalence of major depression than bulimia nervosa, use of measures less dependent on the base rate may be a more appropriate way of comparing reliabilities. The use of such measures (i.e. Yule's Y) shows bulimia nervosa to be as reliably diagnosed as major depression. As can be predicted, it is more difficult to label a non-case of bulimia nervosa as a case than it is major depression. The fairly unique behavioural markers for bulimia nervosa (e.g. binge eating, vomiting) compared to the less discrete features of major depression, which can be shared with other disorders (e.g. insomnia, fatigue, diminished ability to concentrate), may amplify this effect. On the other hand, it is much more difficult to accurately identify true cases of bulimia nervosa than major depression. The occurrence of past major depression may be more accessible to memory as the symptoms are more likely to be reminiscent of aspects of current life experience than are those of past bulimia nervosa. In addition, the presence of more probe questions in the interview for major depression than bulimia nervosa may account for the greater difficulty in detecting bulimia nervosa cases than major depression cases. This suggestion is consistent with the body of neuropsychological literature, which shows that verbal prompts improve verbal recall for both younger and older adults (Reference Cherry, Park and FrieskeCherry et al, 1996).
Salience of behavioural markers
In terms of overall reliability, we replicated the findings of Field et al (Reference Field, Colditz and Herzog1996) where the majority of unreliable cases were women who reported full symptoms of bulimia nervosa on one occasion and responded negatively to probe questions on the other. Of all the behaviours associated with bulimia nervosa reported at Time 1, it was the presence of self-induced vomiting and laxative misuse that were most likely to be remembered at Time 2. This suggests vomiting and laxatives are more salient behavioural markers than other weight loss behaviours, and thereby less vulnerable to memory decay. However, a higher monthly frequency of binge eating rather than any weight loss behaviour significantly predicts reliable reporting of lifetime bulimia nervosa. As not all women use vomiting or laxatives, the frequencies of these behaviours may have had insufficient predictive power. These findings concur with studies on the reliability of major depression, which suggest the more severe the symptomatology, the more memorable the disorder (Reference Aneshensel, Estrada and HansellAneshensel et al, 1987; Reference Foley, Neale and KendlerFoley et al, 1998).
Role of sensitivity and specificity in determining reliability
There appear to be more differences than similarities in the profiles of overall predictive reliability of bulimia nervosa and major depression. Reliability of major depression reporting appears to be affected by overall level of functioning of the individual. The less well the person, as indicated by a number of measures including personality, current symptomatology and lifetime psychopathology, the more likely they were to reliably recall having had major depression. In contrast, there was no effect of personality or attitudes on reliability of bulimia nervosa reporting, and the strongest predictor, apart from the behavioural markers, was a lower likelihood of lifetime major depression. This finding can be explained by examination of sensitivity and specificity. The presence of true cases of major depression is marked by increased problems with psychiatric and personality functioning (unfortunately our ability to detect true cases of bulimia nervosa was limited). Conversely, the detection of true non-cases of both bulimia nervosa and major depression was marked by fewer problems with psychiatric and personality functioning. This would suggest that the overall reliability of bulimia nervosa seems to be characterised more by its ability to accurately detect non-cases, whereas the overall reliability of major depression is characterised more by its ability to detect cases.
A simple comparison of general reliability measures across psychiatric diagnoses is insufficient to elucidate the nature and mechanisms of unreliability. A more useful approach is to examine specific aspects of reliability of reporting, such as sensitivity and specificity. Given that the majority of population-based epidemiological studies utilise structured clinical interviews to identify cases of bulimia nervosa similar to the ones used in this investigation, several strategies can be employed to improve reliability of reporting in the context of such interviews. Incorporating more than one occasion of measurement (Reference Bulik, Sullivan and KendlerBulik et al, 1998) and using more specialised assessment instruments (Reference Wade, Tiggemann and MartinWade et al, 1997) can improve reliability. In addition, the inclusion of a greater number of probe questions can increase the probability that memory of past disorders will more successfully be activated and accessed, thus increasing the detection of true cases.
CLINICAL IMPLICATIONS AND LIMITATIONS
CLINICAL IMPLICATIONS
-
▪ Individuals who self-induce vomiting or misuse laxatives have more reliable recall of bulimia nervosa.
-
▪ More frequent occasions of binge eating are associated with more reliable reporting of the disorder.
-
▪ It is wise to include several probe questions about several behaviours when assessing bulimia nervosa (e.g. binge eating, vomiting, laxative misuse) rather than using a single probe to stimulate recall.
LIMITATIONS
-
▪ Only women were assessed, therefore these results cannot be applied to men.
-
▪ We have low power to draw any conclusions about variables that predict sensitivity of bulimia nervosa.
-
▪ The use of Time 2 assessment as the standard is based on this assessment having only one more probe question than Time l assessment.
ACKNOWLEDGEMENTS
This work was supported by the United States National Institutes of Health (NIH) grants MH-40828, MH-42953, AA-09095, K Award MH-01277 (to K.S.K.) and K01-MH-01553 (to C.M.B.). The Virginia Twin Registry was established by W. Nance and maintained by L. Corey and is supported by United States NIH grants HD-26746 and NS-31564. We would also like to thank the twins for their participation in this research.
eLetters
No eLetters have been published for this article.