INTRODUCTION
The principal sources of infection for Legionnaires' disease (LD), an environmentally acquired pneumonia contracted through inhalation of aerosols contaminated with Legionella bacteria, have been identified as human-made, complex reservoirs of warm recirculated water such as cooling towers and complex hot-water systems [Reference Bhopal1–Reference Dondero3], evaporative condensers [Reference Addiss4, Reference Breiman5], whirlpool spas [Reference Benkel6, Reference Jernigan7] and showers [Reference Breiman8]. The specific environmental conditions under which contaminated aerosol is delivered at an infective dose, sometimes several hundred, or even thousand, metres from the source [Reference Addiss4, Reference Dunn9, Reference Nguyen10], however, remain poorly understood. Community-acquired LD, both in sporadic (i.e. cases not part of a recognized cluster) and outbreak form, shows marked spatial and temporal variations in incidence. Our previous work in Glasgow, Scotland, for example, has shown a multiplicity of clusters in time and space, and an association between disease incidence and distance of home residence from a cooling tower [Reference Dunn9, Reference Bhopal, Diggle and Rowlingson11]. Until recently little attention has been given to the specific meteorological conditions which are optimal for transport and delivery of an infective dose of Legionella-contaminated aerosol. Earlier analysis of data for Glasgow shows a higher incidence of LD in autumn and early winter [Reference Bhopal and Fallon12] which contrasts with more recent findings for the northeastern USA [Reference Fisman13], England and Wales [Reference Ricketts14], and The Netherlands [Reference Joseph and van der Sande15, Reference Karagiannis, Brandsema and van der Sande16] where cases have been reported to occur predominantly in summer. Karagiannis et al. [Reference Karagiannis, Brandsema and van der Sande16] found that warm (but not the hottest), humid weather conditions in summer in The Netherlands coincided with the highest LD incidence. Hicks et al. [Reference Hicks17] recently reported that cases of legionellosis in the USA for the decade 2000–2009 occurred mostly in summer and early autumn (June–October) and, similarly, a study in Toronto, Canada showed legionellosis cases to peak during late summer or early autumn [Reference Ng18]. Cooling tower-associated outbreaks in Australia have been shown to be most frequent in April (autumn in the southern hemisphere) [Reference Bentham and Broadbent19].
More specifically Fisman et al. [Reference Fisman13] demonstrated that for the Philadelphia metropolitan region of the USA incidence of legionellosis was positively associated with relative humidity and precipitation, and negatively associated with average wind speed in the period 6–10 days before onset. García-Fulgueiras et al. [Reference García-Fulgueiras20] noted how low wind speeds (average 9 kph) in Murcia, Spain engendered dispersal of aerosols while other work has suggested a relationship between Legionella survival/LD incidence and increased humidity [Reference Addiss4, Reference Ricketts14, Reference Berendt21] and between legionellosis incidence and increased rainfall [Reference Hicks22]. In contrast, and by measuring concentrations of Legionella species in air samples around aeration ponds at a biological treatment plant in Norway, Blatny et al. [Reference Blatny23] found the highest concentrations during cloudy weather and the lowest during rain. Sala Ferré et al. [Reference Sala Ferré24] suggest that relatively high temperatures (mean 14·4 °C), high humidity (mean 83%) and low mean wind speed (3·6 kph), as well as flat terrain for an area north of Barcelona, Spain could help to explain airborne dispersal of contaminated aerosol from an industrial cooling tower and hence a community outbreak in October 2005. Ng et al. [Reference Ng18] found evidence for an increase in odds of legionellosis occurring with increasing humidity in Toronto, Canada but they identified local hydrological changes, particularly decreases in river flow, to be the strongest contributors to disease risk.
AIMS AND DATASETS
The aim of this paper is to show how statistical modelling can be used to investigate the potential short-term associations between LD occurrence in Glasgow, Scotland and local meteorological conditions. Based on findings cited in the literature above, the epidemiology of LD [Reference Addiss4, Reference Bhopal, Diggle and Rowlingson11, Reference O'Brien and Bhopal25], laboratory data [Reference Berendt21] and the specific seasonal patterns found in Glasgow, we focus on three meteorological variables: air temperature, wind speed and relative humidity. All model fitting was performed using the software R, version 2.12 [26] and freely available add-on packages (the R code for the analyses is available from the authors; however, the case data are not freely distributable).
We used a subset of the case data reported on previously [Reference Bhopal, Diggle and Rowlingson11] for patients diagnosed with community-acquired, non-travel, non-outbreak LD in Glasgow. As part of this earlier work a list of cases was compiled and validated using a range of external information and a case definition conforming to widely agreed principles [Reference Bhopal27]. The case data are therefore drawn from a historical, ad hoc and, for Scotland, unusual study which collected data of sufficient detail and resolution to allow the application of the advanced statistical methods discussed here.
During the period for which case data were available, 1979–1986, there were 120 cases with a Glasgow postcode. Age, sex and grid reference of place of residence were also available for these cases. Of these 120 cases, 78 had a known exact date of onset and these formed our case dataset.
Daily weather data were made available for the period of interest for Glasgow airport, located about 7 miles from the city centre. The main variables of interest comprised daily relative humidity (per cent), daily mean air temperature (tenths of 1 °C, and derived as the mean of the daily maximum and the nightly minimum) and daily mean wind speed (tenths of a knot, 0–24 h GMT).
STATISTICAL MODELLING AND RESULTS
Missing values
Of the 42 LD cases which were excluded from our analysis, month and year of onset were available for 41 and year only for the remaining case. The age range for these 42 cases was 14–84 years (mean 58·0 years) and there were 28 males (66·67%) and 14 females (33·33%). This compares with an age range of 25–81 years (mean 54·7 years) and a male:female ratio of 55:23 (70·51/29·49%) for the 78 cases considered here. In terms of temporal proportions, cases without exact date of onset (n = 42) occurred most frequently in the years 1983 (26·2%) and 1984 (21·4%) and, for cases with only month and year available (n = 41), in the months of September (17·07%) and November (26·83%); for the 78 cases used here corresponding years and months are 1984 (38·5%) and 1985 (30·8%), and October (16·67%), November (15·38%) and December (17·95%).
Temporal patterns
For the 8-year period of interest, 1 January 1979 to 31 December 1986 (n = 2922 days) 75 days were characterized by the occurrence of at least one LD case: 2847 days had no cases, 72 days had one case each and 3 days had two cases each. Using kernel smoothing [Reference Silverman28] we examined the detailed temporal distribution of the 78 LD cases; the resulting kernel-smoothed density curve is shown in Figure 1. For presentation purposes the y-coordinates of the case points are shown in this figure as random offsets (‘jitter’) to avoid overlap. The distribution indicates temporal variation and a large increase in cases during the winter of 1984/1985.
Figure 2 shows the temporal variation in the weather variables with the case data displayed as number of LD cases per month. Strong seasonal patterns are shown for relative humidity and air temperature, roughly anti-correlated, although there is no obvious visual pattern for the two wind variables.
Statistical modelling
We examined the distribution of the 78 LD cases by meteorological conditions in the incubation period (2–10 days before clinical onset of disease, median 6 days) by using a number of different statistical models. We first calculated a moving average (MA) of the weather variables over a 9-day window and fitted a series of Poisson log-linear regression models [Reference McCullagh and Nelder29] to the number of cases on each day (including the days with 0 cases). The results of fitting the first of these models to the number of cases on each day for temperature (TempMA), relative humidity (RHMA) and wind speed (SWindMA), each lagged by 2 days are shown in Table 1. This shows the association of cases with the relative humidity and wind speed variables to be significantly non-zero. There is a positive association with increasing humidity and a negative association with increasing wind speed.
s.e., Standard error; MA, moving average.
Deviances: 573·0201 [2911 degrees of freedom (d.f.)] to 560·1568 (2908 d.f.).
Change: 12·8633 on 3 d.f., P = 0·0049.
In order to consider the effect of the large number of cases in the winter of 1984/1985, and also the seasonal variation within each year, we extended the first model to examine periodicity in the data. This also allowed for any changes over time in diagnostic and reporting procedures for LD cases. In this model we included a categorical variable defined as the 12-month period centred around 1 January of each year, while the overall yearly rise and fall was modelled with a sine wave of period 1 year. This is parameterized with sine and cosine terms giving two parameters which can be converted to amplitude and phase. The results of this analysis (not shown) demonstrate that only the period centred on January 1985 is significant, confirming the visual impression from Figure 1. Our ‘baseline’ model therefore comprises an intercept and a variable that is 1 for the period around January 1985 and 0 at all other times (a true/false term for cases centred on the period 1 January 1985). The model results are highly significant, reflecting the increase in number of cases in the 1985 period.
When we extend the baseline model to incorporate a seasonal effect by using sine and cosine terms the parameters are again highly significant as would be expected from the pattern of cases. By adding into the baseline model the three weather variables using the 2-day lagged MA, relative humidity is shown to be significant (P = 0·0005). To complete this phase of the modelling, we extended the baseline model to include both seasonality and the weather variables (Table 2). This showed no significant relationships with weather and a weaker relationship with the sine-cosine term.
s.e., Standard error; MA, moving average.
Deviances: 573·0201 [2911 degrees of freedom (d.f.)] to 467·0563 (2905 d.f.).
Change: 105·9638 on 6 d.f., P < 0·001.
Inclusion of either weather or sine-cosine terms provides a significantly better fit than does the baseline model, confirming within-year seasonality, while inclusion of both weather and sine-cosine variables gives a significantly better fit than any of the preceding models. However, weather and sine-cosine effects are partially confounded and we need to examine whether short-term fluctuations in weather about their longer-term seasonal patterns show association with LD incidence after adjusting for long-term seasonality in incidence by using the sine-cosine terms. We therefore fitted a simple sine-cosine model to the weather variables so that we could use the residuals as measures of these short-term fluctuations. Note that the responses in these models are the continuous weather variables, so we used normal linear models and not Poisson regression as we do when number of cases is the response. We also used the raw weather values instead of the smoothed moving averages. The residuals from these models demonstrate considerable autocorrelations which correspond to the time-scales over which weather differs consistently from the ‘norm’ for that time of year. In effect these are weather ‘anomalies’ which may include, for example, short ‘cold spells’ or a few windy days. Table 3 shows the results of a model which incorporates the 1985 effect, a sine-cosine term and the residuals from the weather variable models.
s.e., Standard error.
Deviances: 573·3412 [2917 degrees of freedom (d.f.)] to 464·8344 (2911 d.f.).
Change: 108·5067 on 6 d.f., P < 0·001.
Finally, in order to incorporate time-lagged weather effects we fitted the model with a range of time lags. In this model the sine-cosine and 1985 effect parameters do not change significantly. We plotted the parameters of the three weather variables with 2 standard error intervals against a range of 12 lags in order to demonstrate significant non-zero values in the incubation period (Fig. 3). Positive lag values (1–9) represent days immediately prior to date of disease onset and, for illustrative purposes, negative lags (−1 and −2) correspond to lead times where the value of the predictive variable is taken for the days immediately after date of onset. The most significant correlation is that for temperature at a lag of 1 day prior to onset which gives an increase in incidence with increasing positive residual (Fig. 3). There is also weaker significance at 3 days prior to onset which persists at 0 days and at small lead times. Relative humidity and wind speed are much less significant and are therefore removed from the analysis to produce a final model. As shown in Table 4 there is a significant association between LD incidence and air temperature residual lagged by 1 day prior to onset (P = 0·0014). The residuals in this final model are uncorrelated and the residual deviance is much smaller than the residual degrees of freedom, indicating an absence of over-dispersion. None of the other Poisson models shows over-dispersion.
Deviances: 573·5015 (2920 d.f.) to 460·3270 (2916 d.f.).
Change: 113·1745 on 4 d.f., P < 0·001.
DISCUSSION
The statistical approach developed in this paper has further advanced understanding of the role which weather plays in risk of LD. Our initial model showed that relative humidity was positively associated and wind speed was negatively associated with LD incidence in Glasgow, Scotland. This finding is consistent with work elsewhere [Reference Addiss4, Reference Fisman13–Reference Ricketts14, Reference García-Fulgueiras20–Reference Berendt21, Reference Sala Ferré24] and lends support to the suggestion that high levels of humidity favour survival of Legionella bacteria [Reference Addiss4, Reference Berendt21]. However, these associations were not significant once we adjusted for season and year.
We also explored the potential influence of short-term, unseasonable weather conditions or ‘anomalies’ and our findings show a small but significant association between LD incidence and mean air temperature residual on the day preceding date of onset. Given the incubation period of 2–10 days prior to onset this result needs to be interpreted with careful consideration and we cannot say whether this association is causal. The association with mean air temperature residual at 3 days prior to, and immediately following, onset is almost certainly a side-effect of autocorrelation in the residual temperature values. When two time-series are autocorrelated, a direct relationship between the two at a particular lag typically generates indirect associations at other lags; hence the overall pattern of the cross-correlations is more important than their individual significance. In light of this the contextual role of unseasonably high air temperatures is therefore worthy of further investigation. Previous findings on the importance of air temperature indicate a potential role for higher seasonal temperatures although evidence is lacking for an association during the incubation period. Using mean monthly temperatures for states in the Mid-Atlantic region of the USA, Hicks et al. [Reference Hicks22] found statistically significant correlations between monthly case counts of legionellosis and temperature in the preceding month. Using negative binomial regression and controlling for state and year they reported a 2·8% increase in risk for legionellosis for a 1 °C increase in temperature. In the study by Karagiannis et al. [Reference Karagiannis, Brandsema and van der Sande16] mean weekly temperature was found to be associated with LD incidence in The Netherlands, with higher temperatures in the 2 weeks prior to exposure further contributing to higher incidence. For England and Wales Ricketts et al. [Reference Ricketts14] found a correlation between disease incidence and temperature 10–14 weeks prior to onset although this was dependent on both year and season (quarter); no consistently significant effect of temperature was shown for the incubation period. Fisman et al. [Reference Fisman13] similarly found no relationship between temperature during the incubation period and legionellosis incidence.
Our understanding of the complex relationships between ambient weather and risk of LD is complicated by the use of data of different spatial and temporal resolutions and quality in different settings, and in this context it is important to consider each study's limitations. First, in the present research we made the assumption that weather monitored for Glasgow airport pertains to the whole of the Glasgow postcode area. Although the airport weather station was the nearest available station monitoring daily data, its use may not account for smaller scale spatial variations within the urban environment. Additionally, and as indicated by Ricketts et al. [Reference Ricketts14], even more highly localized variations in aerosol concentration can occur for sources of Legionella bacteria such as spa pools. It is therefore difficult to comprehensively account for each and every variation in the meteorological conditions of a complex environment such as a large city. Further work is warranted to explore spatial dispersal mechanisms through techniques such as air quality dispersion modelling, the results of which can be interfaced to Geographical Information Systems (GIS) software [Reference Nguyen10, Reference Dunn and Kingham30, Reference Dunn31]. However, the nature of aerosol emissions from sources such as cooling towers and condensers may present greater complexities than, say, a single factory chimney whose emissions can be more readily modelled. Blatny et al. [Reference Blatny23] propose the use of computational fluid dynamics to estimate aerosol paths.
Second, suitability of the individual daily meteorological variables needs to be considered. Temporal dispersion of aerosols through the ambient atmosphere is complex, cooling tower mists being carried in different directions at different times of the day [Reference Blatny23, Reference Brown32]. It should therefore be considered that variations in weather patterns are possibly not adequately characterized by summary daily measures such as daily means which we used in our analyses of wind speed and air temperature. Similarly, relative humidity is measured as a single value recorded at 09:00 hours GMT which may hide diurnal variations caused, for example, by changes in temperature or by advection of air masses. These summary measures, therefore, may not be fully representative of the ambient conditions which determine aerosol dispersion.
Third, it is important to consider potential errors in diagnosis and reporting of LD cases. Our analysis is based on a relatively small number of cases and pertains to a historical time period. However, although dated, the dataset has been validated as part of earlier work and meets the objective of the present study to use a number of different statistical models to examine the potential relationship with meteorological conditions pertaining to the time of diagnosis. There is, however, a need for analysis of more contemporary case data for Scotland and also for our methods to be applied to datasets for other settings, and to this end we welcome collaborative opportunities.
In addition to the 78 cases which form the subject of analysis here, an additional 42 cases were excluded because the exact date of onset was missing. This level of completion (65%) is not unexpected given the historical nature of the dataset. For more current data Fisman et al. [Reference Fisman13], for instance, report that 85% of legionellosis cases had a documented date of symptoms onset. The case dataset shows a significantly large number of LD cases in the winter of 1984/1985. There was an outbreak of LD in Glasgow in summer 1984 [33] and it is possible that heightened awareness and increased vigilance may have resulted in additional diagnosis and reporting of cases in subsequent months. An alternative explanation may relate to the possibility of a more highly pathogenic strain of the bacterium being present at that time [Reference Ricketts14, Reference Harrison34]. The explanation for this increase notwithstanding, our analysis has attempted to account for the excessively large number of cases which occurred at this time by including a factor for this period.
Finally, it is important to consider the spatial boundary of the case data. We have limited our dataset to cases which were identified by a Glasgow area (G) home address postcode. Of other cases in Scotland the postcode area PA (Paisley) is located close to Glasgow city (to the north and west) although the number of cases in this postcode area which were categorized as community-acquired, non-travel, non-outbreak and with an exact date of onset is very small (n = 3).
Recent years have witnessed a growth in interest in investigating the meteorological conditions which may be associated with raised incidences of LD. Identification of such conditions is important since it not only facilitates our understanding of the environments in which Legionella bacteria survive and are dispersed, but may also aid disease detection and prediction. Findings from this research avenue are likely to become increasingly pertinent to public health policy and practice given changes in global climate [Reference Ricketts14, Reference Karagiannis, Brandsema and van der Sande16]. Our approach in this paper has highlighted how cross-disciplinary collaboration provides a valuable platform for enhancing our understanding of the specific meteorological conditions which may be most pertinent under such changing climates.
Despite, or perhaps because of, the recent flurry of research activity around the relationship between ambient weather conditions and the onset of LD a number of different findings have emerged as well as some similarities. Our analysis of daily meteorological data for Glasgow, Scotland has indicated a potentially complex relationship between ambient weather conditions and LD incidence. There is evidence of temporal variation over an 8-year period with a large peak in cases in one particular winter. Although our initial, simple models indicated a significant association between disease incidence and both relative humidity and wind speed conditions during the preceding days of incubation, further developments to account for seasonal effects show no significant relationships with weather variables. After adjusting for long-term seasonality in disease incidence, short-term changes in weather were examined for potential associations with disease incidence by using the residuals from the weather variable models; this demonstrates a significant association with the mean air temperature residual, although most notably outside of the incubation period, and not for relative humidity or wind speed. Fisman et al.'s [13, p. 2072] statement that ‘the seasonality of non-vectorborne infectious diseases remains poorly understood’ is still relevant, therefore, and there is need for additional detailed analysis of the role of the three weather variables considered here. By adding to a sparse but important area of research in airborne infections our methods and findings form a strong foundation for such further work.
ACKNOWLEDGEMENTS
The Legionnaires' disease dataset was collected with financial support from the former Greater Glasgow Health Board and the former Pneumonia Research Trust. We thank the late Dr R. J. Fallon for his invaluable role in establishing a register of cases. We thank the patients, general practitioners, hospital consultants and medical records officers for help with compiling the database. We thank the Glasgow Weather Centre for meteorological data.
DECLARATION OF INTEREST
None.