National Health Service (NHS) mental health beds in England were reduced by 73% between 1987–1988 and 2018–2019, from 67 100 to 18 400 (Wyatt Reference Wyatt, Aldridge and Callaghan2019).This reduction has been attributed to the government policy of ‘care in the community’. The number of mental health beds occupied also decreased, albeit at a slower rate, leading to increased bed occupancy. Bed occupancy in mental health units was reported to have reached 90% by 2018–2019 in England (Wyatt Reference Wyatt, Aldridge and Callaghan2019), exceeding the Royal College of Psychiatrists’ optimal recommendation of <85% (Royal College of Psychiatrists 2011: p. 10). Medical non-psychiatric units have reduced bed occupancy by reducing the average length of stay, but mental health units have mainly reduced patient admissions. This could be explained by the reported increased average threshold for admissions across England (Wyatt Reference Wyatt, Aldridge and Callaghan2019).
Length of hospital stay for mental health conditions varies across the NHS. Multiple factors increase the likelihood of longer hospital stay, including male gender, Black, Asian and Minority ethnic (BAME) background, being homeless or in supported accommodation, diagnoses of psychosis and number of care coordinators (Newman Reference Newman, Harris and Evans2018). Other factors not studied but recognised by clinicians include variation in admission thresholds and in estimation of risk level between clinicians. The National Service Framework for Mental Health (Department of Health 1999), which sets quality standards for mental health services, endorses short stay with good-quality community care and rapid follow-up. Nevertheless, the average length of stay in a mental health bed across England is still around 7 weeks (Wyatt Reference Wyatt, Aldridge and Callaghan2019).
The Cochrane review
The clinical question
This month's Cochrane Corner review (Babalola Reference Babalola, Gormez and Alwan2014) aimed to compare short- and long-stay admissions in mental health hospital for people with severe mental illness. A total of 2030 participants from 6 randomised controlled trials (RCTs) (Box 1) were included. The largest trial (Burhan Reference Burhan1969) involved 1169 participants. Participants were described as ‘people with schizophrenia, related disorders or “severe/chronic mental disorders/illnesses”, however defined’. All trials focused on an adult population, excluding children, adolescents, the elderly and those with intellectual disabilities. Kennedy & Hird's study (Kennedy Reference Kennedy and Hird1980) included patients from unselected acute psychiatric settings, such as individuals with organic brain disease and alcohol problems, introducing heterogeneity into the pooled analysis.
A randomised trial is an experimental study where participants are randomly allocated to either an intervention or a comparison group (e.g. another intervention). When the comparison group receive either placebo or no intervention, the trial is called randomised controlled trial (RCT).
The two interventions under comparison were ‘planned short stay’ and ‘planned long stay’, however defined within the studies. The review authors proposed an arbitrary cut-off value of 28 days, based on the compulsory detention period for assessment defined by the Mental Health Act 1983, but there was noticeable variability in the definitions of short stay – 1 (Herz Reference Herz, Endicott and Spitzer1975) to 4 weeks (Glick Reference Glick, Hargreaves and Raskin1975) – and of long-stay between studies.
Based on duration since admission, outcomes were categorised into short (<3 months), medium (3–6 months), long (6–12 months) and longer term (1–2 years or more), but only data for long-term outcomes were available for analysis. The reported outcomes were grouped into primary outcome (global state) and secondary outcomes (death, change in specific symptoms of schizophrenia, readmission, premature discharge, delayed discharge, leaving the study early and general functioning). Binary data alone were used to express the effects of interventions, as standard deviations for scales used in the trials were not reported and could not be obtained.
Methodology
The original search (Johnstone Reference Johnstone, Zolese and Alwan1999) identified five RCTs that met the inclusion criteria. An update in 2007 (Alwan Reference Alwan, Johnstone and Zolese2008) added 1 more trial to the analysis. For the latest update (Babalola Reference Babalola, Gormez and Alwan2014), the authors searched the Cochrane Schizophrenia Group's register, which is based on MEDLINE, Embase, CINAHL and PsycInfo, in May 2012. They also searched the references of identified studies and contacted first authors of included studies for further unpublished trials. Eventually, the review included a total of six RCTS conducted between 1960 and 1980. Two quasi-randomised trials were identified but were excluded from the main analysis.
Two review authors independently assessed the quality of the included trials using the GRADE system, and risk of bias using criteria from the Cochrane Handbook for Systematic Reviews of Interventions (Higgins Reference Higgins and Green2011) (there is uncertainty regarding which version they used: Babalola et al give 2011 as the publication date of version 5.0.2, whereas 2011 is the date of version 5.1: training.cochrane.org/cochrane-handbook-systematic-reviews-interventions). Although all trials were randomised, no trial explicitly reported the means of randomisation. Glick et al (Glick Reference Glick, Hargreaves and Goldfield1974) had the lowest risk in terms of random sequence generation and allocation concealment (selection bias) (Box 2).
Random allocation sequence is a key component of randomised trials (Dettori Reference Dettori2010), and if performed correctly on a large sample, it does significantly reduce the risk of bias, especially selection bias. It has two components: generating an unpredictable random sequence (random sequence generation) and concealment until participants are assigned to intervention/control groups (allocation concealment).
Note that not all methods described as ‘random allocation’ are in fact random. Examples of methods to be avoided if random allocation is desired include using hospital chart numbers, alternating patients sequentially or assigning by date of birth. The best methods for random allocation use a random-numbers table or a computer software programme.
Masking (‘blinding’) of participants and clinicians could not be realistically achieved. However, other forms of masking, such as masking of data analysts (Box 3), could have been implemented. No form of masking was reported in any trial involved, resulting in high risk of performance and detection biases.
Masking (‘blinding’) is essentially the concealment of research design elements, such as group assignment, treatment agent and research hypotheses, from certain groups. It is particularly important when subjectivity of assessment is expected. Masking can be done at three main levels (Page Reference Page and Persch2013):
• masking of participants: to minimise altered attitude and cooperation resulting from knowledge of group assignment
• masking of healthcare providers: which is important when knowledge of assignment could change normal care decisions or outcome monitoring, owing, for example, to excitement or enthusiasm about the intervention
• masking of data collectors: to ensure objectivity in recording response to interventions under comparison.
Excepting the Burhan trial (Burhan Reference Burhan1969), all trials reported incomplete outcome data in their analyses at different percentages, the largest being in the Hirsch et al trial (Hirsch Reference Hirsch, Platt and Knights1979), with 53% exclusion at 1 year. Only data for two outcomes could be used from this study (readmissions and loss to follow-up at 1 year), as intention-to-treat numbers could not be calculated.
Data were reported using standard estimation of risk ratio (RR) with 95% confidence intervals (95% CI). Although P-values were not reported, confidence intervals are more informative than P-values (du Prel Reference du Prel, Hommel and Röhrig2009) (Box 4). For statistically significant results, the numbers needed to treat to provide benefit (NNTB) and to induce harm (NNTH) were calculated, with 95% CI. Heterogeneity was assessed on clinical, methodological and statistical levels. The review authors chose to use a fixed-effect model over random-effects ones for data synthesis.
The P-value is the probability that an outcome's results would have occurred by chance. Standard scientific practice defines a P-value of less than 1 in 20 (P < 0.05) as ‘statistically significant’ and a P-value of less than 1 in 100 (P < 0.01) as ‘statistically highly significant’. P-values allow a binary (yes/no) decision to be made about a previously formulated null hypothesis.
A confidence interval (CI) is a range of values within which the results of a statistical test fall with a predefined probability; 95% is usually used in statistical tests of clinical trials, which means that the true value of the test lies within the defined interval in 95 out of 100 times.
Confidence intervals and P-values are complementary measures, and usually are both reported in research articles. Confidence intervals have the advantage of providing information about the range of the observed effect size, and the width of the confidence interval gives an idea of the precision of the results (du Prel Reference du Prel, Hommel and Röhrig2009).
Results
Disappointingly, no study reported outcome data for the primary outcome (change in global state).
No significant difference in reported deaths was found at 2-year follow-up (n = 175, RR = 0.42, 95% CI 0.10–1.83). Causes of death related to mental illness were unfortunately indistinguishable from other causes.
Improvement of mental state was not different between groups, whether measured by the Psychiatric Evaluation Form (PEF) scale (n = 61, 1 RCT, RR = 3.39, 95% CI 0.76–15.02) or the Health-Sickness Rating Scale (HSRS) (n = 61, 1 RCT, RR = 0.97, 95% CI 0.31–3.01).
No difference in readmission rates at 1 year (n = 651, 4 RCTs, RR = 1.26, 95% CI 1.00–1.57) or 2 years (n = 229, 2 RCTs, RR = 1.03, 95% CI 0.78–1.36) was found. Interestingly, adding data from the Burhan trial (Burhan Reference Burhan1969) introduced elevated heterogeneity into the analysis of this outcome (I 2 = 71.7% at 1 year and 92.7% at 2 years), resulting in significantly fewer readmissions in the short-stay group (n = 1169, RR = 0.22, 95% CI 0.07–0.67 at 1 year; n = 1169, RR = 0.21, 95% CI 0.11–0.41 at 2 years). A newer study (Moran Reference Moran, Jacobs and Mason2017), however, disagrees with such findings, reporting association between higher rates of emergency readmissions and shorter length of stay.
Adding data from the Kennedy & Hird trial (Kennedy Reference Kennedy and Hird1980) introduced heterogeneity (I 2 = 62.4% at 1 year) in the opposite direction: the short-stay group had more readmissions, albeit for shorter duration (RR = 2.23, 95% CI 1.3–3.7 at 1 year).
No difference in early discharge rates (described as abrupt premature discharge against medical advice) (n = 229, 2 RCTs, RR = 0.77, 95% CI 0.34–1.77) was found. Significantly fewer delayed discharges were noted in the short-stay group (n = 404, 3 RCTs, RR = 0.54, 95% CI 0.33–0.88), which agrees with the concept of institutionalisation, where longer hospital stays make it difficult for patients to reintegrate into society. This should be interpreted with caution, though, as including data from the quasi-randomised trials eliminated this effect. Even though these studies are of lower quality in terms of randomisation, it is hard to say with certainty whether this explains the heterogeneity.
There was no difference in incidence of self-harm episodes (described as violent acts to the self or parasuicide episodes) (n = 247, 1 RCT, RR = 0.17, 95% CI 0.02–1.30). Acts of self-harm are more common in certain groups of patients (e.g. those with borderline personality disorder) and no association between diagnostic category and self-harm episodes was reported in the single study that reported on this outcome.
Participants in short-stay groups were more likely to be employed at 2 years (n = 330, 2 RCTs, RR = 0.61, 95% CI 0.50–0.76, NNTB = 5, 95% CI 4–8). Again, this agrees with the concept of institutionalisation. No difference in work attendance at either short-term (3 weeks) (n = 247, 1 RCT, RR = 1.50, 95% CI 0.61–3.65) or medium-term (4 months) assessment (n = 247, 1 RCT, RR = 1.70, 95% CI 0.75–3.85) was reported.
Although the Glick et al trial (Glick Reference Glick, Hargreaves and Drues1976) reported that the mean cost of out-patient care was higher in the short-stay group, there was no reference to the statistical significance of this difference. This could be explained by the more intensive community follow-up, although this was not reported in the trial.
Discussion
Quality of evidence
Quality of evidence was low or very low for all outcomes. Reasons for this varied between a single study supporting findings, low number of participants, inconsistency between studies, and risk of bias related to randomisation, allocation concealment and masking. The fact that these trials were reported before the development of the Consolidated Standards of Reporting Trials (CONSORT) guidelines in 2001 could explain some of the biases, particularly the limited reporting of randomisation and unclear reasons for participant loss at follow-up. Implementing reporting standards using CONSORT will, in a way, also improve the design of clinical trials.
Participants
The broad definition of participants included different diagnostic categories. There was no clear definition or cut-off for illness severity, and data from trials were insufficient to report on subgroups with similar conditions or severities. This is important as different psychiatric diagnoses have different care needs and different prognoses. For example, patients with severe mental illness with prominent depressive features are at higher short-term risk for suicide following discharge (Olfson Reference Olfson, Wall and Wang2016).
The Health of the Nation Outcome Scales (HoNOS) (Wing Reference Wing, Beevor and Curtis1998) (Box 5) could provide an answer to the subjective description of severity.
HoNOS is an instrument developed by the Royal College of Psychiatrists to assess the health and social functioning of people with severe mental illness (Wing Reference Wing, Beevor and Curtis1998). The scales are widely used by National Health Service mental health foundation trusts in England, and they provide a validated tool to assess severity of mental illness.
HoNOS comprises 12 scales, and each scale is given a value between 0 and 4 by the clinician:
(1) Behavioural disturbance
(2) Non-accidental self-injury
(3) Problem drinking or drug use
(4) Cognitive problems
(5) Problems related to physical illness or disability
(6) Problems associated with hallucinations and delusions
(7) Problems associated with depressive symptoms
(8) Other mental and behavioural problems
(9) Problems with social or supportive relationships
(10) Problems with activities of daily living
(11) Overall problems with living conditions
(12) Problems with work and leisure activities and the quality of the daytime environment.
Matching of participants is essential to avoid bias (Box 6). The Glick et al trial (Glick Reference Glick, Hargreaves and Goldfield1974) reported important differences between groups, including education, socioeconomic status, premorbid adjustment and mean dosage of chlorpromazine equivalent. The study's authors found it difficult to estimate the degree of ‘confounding effect’ exerted by these differences. Statistical correction for a confounding variable is theoretically possible through regression analysis, but there was no indication that this was attempted.
Matching means pairing (or similarity) between participants from comparison groups in the values of the matching variable(s). These matching variables are determined on the basis of their potential association with the outcome (usually the primary outcome).
Matching aims to reduce bias due to baseline group differences, thereby reducing the variability, and increasing the precision, of the group comparisons (Simon Reference Simon and Chinchilli2007).
Search strategy
Other strategies to make a thorough search for trials that the review authors should have considered include foreign language literature, grey literature and references of references. Foreign language literature is particularly important here, considering the closure of large mental health institutions in North America and Europe since the 1960s.
The small-study effect
Since fewer than 10 studies were included in this review, using a funnel plot to assess for reporting/publication bias was not appropriate (Higgins Reference Higgins, Thomas and Chandler2020). Such bias could lead to ‘small-study effect’, a phenomenon in which estimates of intervention effects in small studies tend to be greater than in large ones. Small-study effects are specifically relevant to this review, given the large difference in sample size between some studies. The random-effects model weighs studies relatively equally (Higgins Reference Higgins, Thomas and Chandler2020), which could enhance the small-study effect. This is a problem not seen in the fixed-effect model, which the review authors chose to use. It would have been helpful if results were expressed using both models to see whether the small-study was present.
Heterogeneity
Heterogeneity was observed in the pooled analysis of readmission rates. Considering that the Kennedy & Hird trial (Kennedy Reference Kennedy and Hird1980) is a much smaller study than the Burhan trial (Burhan Reference Burhan1969) and included different categories of patient (patients with organic brains diseases and alcohol problems), clinical significance should be interpreted in that light.
External validity: old evidence for current practice
A striking observation in this Cochrane review is the age of the trials (1960–1980). The more recent change from large psychiatric institutions to smaller psychiatric units, and the reduction of mental health bed numbers, highlights that the current practice of psychiatry is different and community mental health services are utilised more efficiently. New studies in the current circumstances would better reflect the outcomes – or shortcomings – of the proposed interventions.
Not only were the studies old: the review itself was last updated in 2012. Is there any newer research in this area? Searching PubMed in December 2020 using similar parameters to the review revealed no new studies comparing outcomes for the two interventions. Research in this area is mainly focused on factors affecting length of admission and readmission. Obviously, this was not a comprehensive search, but it suggests a dearth of research in this area – confirming what had already been noted when the two updates of the Cochrane review in 2007 and 2012 failed to identify any new study. The lack of standard definitions for short- and long-term admission, the pressure on beds mentioned above, and the difference in presentation and outcome of different psychiatric diagnoses could provide an explanation for the paucity of evidence available.
Cost of care
No trial reported the cost of in-patient stay, indirect costs (such as travel) and intangible costs (such as inconvenience). It is worth noting the exclusion of four trials in the latest update because they compared day hospital care with in-patient stay and/or focused on ‘economic evaluation’. Data from these trials, however, could shed some light on the economic aspects.
Conclusions: is the available evidence sufficient for clinical practice?
This Cochrane review (Babalola Reference Babalola, Gormez and Alwan2014) provides low-quality evidence that short hospital admission of patients with severe mental illness does not increase the risk of death, readmission, worsening of mental state or reduced work attendance in comparison with long admission. There is also limited evidence that short admission could indeed be associated with a lower risk of unemployment and result in reduced delays in discharge from hospital.
This evidence is reassuring concerning the safety and outcome of brief admissions, but is in much need of an update. The age of the trials, the low quality of evidence from the review and the lack of differentiation for outcome measures in relation to diagnosis all speak volumes about the need for contemporary, well-designed, focused randomised controlled trials to inform current mental health in-patient practice.
Assessment of cost is key to decision-making, especially for healthcare policy makers. If studies of the cost of short- and long-admissions to healthcare are to be conducted, researchers should investigate direct, indirect and intangible costs to inform decision-making regarding the most efficient management strategies.
Data availability
Data availability is not applicable to this article as no new data were created or analysed in this study.
Acknowledgement
I thank Dr Eman Elbakrawy for her help with formatting and citation of the references, and John Leake for proofreading.
Funding
This article received no specific grant from any funding agency, commercial or not-for-profit sectors.
Declaration of interest
None. This commentary reflects the author's views, but not necessarily the views of Berkshire Healthcare NHS Foundation Trust or Health Education England Thames Valley.
eLetters
No eLetters have been published for this article.