Introduction
Dengue fever (DF) affects more than 100 million people every year and is one of the most important mosquito-borne diseases in the world [Reference Vanessa1]. DF is the main source of human deaths from vector-borne disease [Reference Gubler2]. The dengue virus is transmitted by a female mosquito of the Aedes aegypti species [Reference Cabrera3, Reference Gibbons and Vaughn4] and consists of four serotypes: DENV-1, DENV-2, DENV-3 and DENV-4 [Reference Gubler2, 5, Reference Chen6]. These serotypes can induce a range of symptoms, including the most dangerous stage: dengue haemorrhagic fever (DHF) which is characterised by circulatory collapse and death [Reference Gibbons and Vaughn4, Reference Raja and Devi7].
In a Bayesian analysis, estimates, predictions and inference are based on posterior distributions. Bayes’ theorem states that this posterior distribution, which expresses the probability of a parameter given the data, equals the multiplication of the likelihood function (the probability of the data given the parameters) with the prior probability distribution for these parameters divided by the probability of the data [Reference Austin8]. This contrasts with a frequentist approach, which derives parameter estimates from the likelihood alone. Bayesian statistical regression models have been used to effectively describe epidemiological data characterised by spatial and spatio-temporal structure [Reference Blangiardo9]. The fundamental feature of Bayesian approaches is the use of probability for measuring uncertainty in inferences [Reference Gelman10]. The major appeal of these approaches is in considering uncertainty in the predictions or estimates and the straightforward incorporation of spatial and temporal structure as prior distributions [Reference Blangiardo11]. This approach also allows one to take into account a much wider class of conceptual models than non-Bayesian approaches [Reference Dunson12]. The priors can also be used to incorporate information from preceding studies [Reference Blangiardo9].
Previous systematic reviews of DF models have been conducted, but their objectives were different. Three papers have focused on assessing the influence of climate change on transmission of dengue [Reference Naish13–Reference Fan15], one of which specifically considered the effect of temperature [Reference Fan15]. Some reviews have examined the epidemiology of dengue in a certain country, for example, in Thailand [Reference Limkittikul, Brett and L'Azou16], Saudi Arabia [Reference Messina17] and four high-income countries [Reference Viennet18]. One review reported on different types of modelling methods for early warning systems [Reference Vanessa1] and the different kinds of spatial methods in dengue transmission [Reference Oliveira, Ribeiro and Castillo-Salgado19], respectively.
There has only been one systematic review paper that considered spatial and spatio-temporal modelling approaches to generate a risk map of dengue [Reference Louis20]. This paper identified important predictors for categorical and continuous risks and four types of maps (descriptive, validated, predictive and early warning system). Twelve modelling approaches were identified in 26 publications. The most popular were spatial analyses of case clusters, measures of spatial autocorrelation and logistic regression and multinomial models. However, these methods were not described and Bayesian methods were not discussed.
Despite the appeal of Bayesian models and their popularity in epidemiology, to our knowledge, there are no published systematic review articles of spatial or spatio-temporal modelling of DF using Bayesian methods. The objectives of this systematic review were to identify and review published Bayesian spatial and spatio-temporal models that have been applied to DF, to assess analytical methods including the structure of the model, the use of prior distributions and the inclusion of covariates, and then to identify opportunities for future research.
Methods
Search terms and databases
The methodology for this review included a search strategy, and inclusion and exclusion criteria based on the preferred reporting items for systematic reviews and meta analyses (PRISMA) guidelines [Reference Liberati21, Reference Moher22]. Biomedical databases (Medline (via Ebscohost) and PubMed), science databases (ScienceDirect, Scopus and Web of Science) and an all disciplines database (ProQuest) were searched electronically in December 2017. A manual search through reference lists of articles was also undertaken. The literature search was limited to refereed journal articles published from January 2000 to November 2017 in English. Databases were searched with the same keywords, dengue and spati* and Bayesian. The search spati* retrieved spatial, spatio-temporal and spatiotemporal. A Boolean operator was implemented to link the keywords. All results were combined and the duplicates removed using EndNote. The titles and abstracts of articles found through keyword searches were screened first by one author and then the papers identified were evaluated through reading the full text and selected according to the inclusion criteria. This stage was performed by two authors independently. Disagreement between authors was resolved by discussion and consensus.
Inclusion and exclusion criteria
The inclusion criteria were as follows. First, articles had to be published in a peer-reviewed journal. Second, studies were included if they used Bayesian spatial models or Bayesian spatio-temporal models to model DF. A spatial model was defined as one that explicitly included a geographic index for areas or observations and that then linked these areas in some manner, such as through a random-effects term. Similarly, a temporal model was defined as one that explicitly included a time index. Only English articles were included. No geographical restrictions were applied. The exclusion criteria were as follows: models that were not applied to dengue, non-spatial, non-Bayesian models, modelling of dengue vectors, dengue virus phenotypes, review papers and conference/workshop proceedings. Bayesian models that only considered a temporal component were also excluded. Modelling of dengue mosquito vectors and their egg numbers [Reference Costa23], rather than cases of DF, were excluded. Similarly, modelling the dengue virus was excluded if it was generally about the spread of the dengue virus (via infected humans or mosquitos) and the occurrence of viral genetic diversity. Review papers were read and pertinent studies included, but not the review paper itself. This systematic review is registered on PROSPERO (reference: CRD42018084054).
Quality assessment
All papers fulfilling the inclusion criteria were critically appraised by two reviewers independently to identify the strengths and weaknesses of each paper. Any disagreement between reviewers was resolved by consensus. The critical appraisal was performed using the adapted tool for assessment of modelling study quality and risk of bias by Harris et al. [Reference Harris24] which is a modification of that proposed by Fone et al. [Reference Fone25] (Supplementary Table S1). Part A assesses screening questions, Part B checks model validity and Part C assesses the overall results and study conclusions. The adapted tool contains questions for each of eight criteria and clear guidance for scoring. Papers were scored from 0 (poor) to 2 (good) on each of the eight criteria, giving a maximum score of 16 points. A quality level of ‘low’ (<8), ‘medium’ (8–10), ‘high’ (11–13) or ‘very high’ (>13) was assigned to each paper based on the overall score.
Data extraction
All data were extracted and collected manually. Extracted data included first author, year of publication, study area, time period of study, dengue data (number of cases, time interval collection period and number of areas), covariate data, objectives, analytical method, model structure, key findings, further studies and software. Details of covariates used in the included papers were also extracted.
Results
Literature search
The flow chart of this literature search is given in Figure 1, and the list of detailed content of studies is given in Supplementary Table S2. Based on keyword searches, 26 articles from Medline (via Ebscohost), 486 from ProQuest, 26 from PubMed, six from ScienceDirect, 44 from Scopus and 42 from Web of Science were obtained. Five additional records were identified through manual searches. From the 635 citations initially identified, 489 potential relevant articles remained after removal of duplicates (146 duplicate articles). Screening of titles and abstracts removed an additional 437 papers. A further 21 of the 52 remaining articles were excluded for not meeting the inclusion criteria after reviewing the full article. As a result, 31 articles were finally included in the review.
Dengue data
Time intervals and geographic regions
In this review, one study used daily reported dengue cases, eight studies used weekly dengue cases, 15 studies used monthly cases of dengue, one study used quarterly data and six studies used annual dengue cases. The longest period of study was for 384 months (32 years), while the shortest period of study was for 3 months (91 days), with an average of 7 years and a median of 4 years. Eighteen studies used dengue datasets with <7 years, while 13 studies used datasets of 7 years or greater (Supplementary Table S2).
The largest number of districts was 1065 and the smallest number was 10 districts. There were 10 studies in Brazil, six studies in Indonesia, four studies in Taiwan, three studies in Thailand and Australia, respectively, two studies in Colombia and one study in each of Malaysia, China and Puerto Rico.
Covariate data
The type and number of covariates included in the models varied widely among the studies reviewed (Table 1). Six categories of covariates were identified, namely climatic, demographic, socio-economic, entomological, geographic and temporal. Although four studies [Reference Lowe26–Reference Wijayanti29] examined four of these categories of covariates, most studies used two or three categories while three studies did not include any covariates.
a Refers to numbers in Table 2.
a Spatial autoregressive (SAR).
b Conditional autoregressive (CAR).
c Cubic spline smoothing (CSS).
d First-order autoregressive (AR(1)).
e First-order random walk (RW1).
f Besag–York–Mollié (BYM).
g Penalised splines (P-splines).
h Bayesian Maximum Entropy (BME).
Climatic covariates
More than half of the studies (20 out of 31) used climatic variables in modelling DF disease. The most commonly used predictors included temperature and precipitation. Two studies additionally included the El Niño Southern Oscillation Index (SOI) [Reference Lowe26, Reference Yu30]. One study used temperature, precipitation and Oceanic Niño Index (ONI) [Reference Lowe27]. Mean temperature, minimum temperature and maximum temperature [Reference Vazquez-Prokopec31], night-time temperature and day-time temperature [Reference Wijayanti29] and monthly mean maximum temperature, mean minimum temperature and cumulative precipitation [Reference Johansson, Dominici and Glass32] were also included as climatic predictors in the reviewed models. Five studies used precipitation only [Reference Fernandes, Schmidt and Migon33–37].
Demographic covariates
Out of 31 studies, 13 included demographic data. Most studies only used one of the eight categories of demographic variables considered in Table 1, while only one study used four categories: population density, age structure, mean age of population and household density [Reference Kikuti38]. Population density was the most common demographic variable used in modelling DF.
Socio-economic covariates
Socio-economic data were used in seven studies [Reference Wijayanti29, Reference Kikuti38–Reference Ferreira and Schmidt43]. The most common socio-economic variable was educational level (seven studies), followed by income and garbage collection (five studies each). One study used seven categories of socio-economic variables [Reference Kikuti38].
Entomological covariates
Only six studies incorporated entomological (mosquito) data in their models [Reference Yu30, Reference Vazquez-Prokopec31, Reference Jaya36, Reference Vargas42, Reference Pepin44, Reference Zhu45]. A Breteau index (BI) which is defined as the number of positive containers (i.e. containing A. aegypti larvae) per 100 houses inspected, was used as a predictor in two studies [Reference Yu30, Reference Zhu45] and a House Index (HI), which is defined as percentage of houses infested with larvae and/or pupae, was used as a predictor in another two studies [Reference Jaya36, Reference Vargas42] to identify dengue transmission areas. A larva-free home index and a healthy housing index were included to determine their impact on the DF relative risk [Reference Jaya36]. Indoor residual spraying has also been considered in modelling DF [Reference Vazquez-Prokopec31].
Geographic covariates
Nine studies used geographic characteristics in their model. Altitude [Reference Lowe26–Reference Lowe28] and mean vegetation index [Reference Wijayanti29, Reference Martínez-Bello, Lopez-Quilez and Alexander Torres46, Reference Martínez-Bello, López-Quílez and Prieto47] were the most common features used. Out of nine geographic variables applied, most studies used only one indicator. Only one study [Reference Kikuti38] used three indicators, which were the mean elevation to sea level, elevation range and distance from the census tracts (CTs) centroid to the health service.
Temporal covariates
Temporal data were used in 10 studies. Most studies included a temporal lag in the climate data [Reference Lowe26–Reference Lowe28, Reference Yu30, Reference Johansson, Dominici and Glass32, Reference Pepin44, Reference Lowe48, Reference Chien and Yu49] and only two studies included temporal entomological data [Reference Yu30, Reference Pepin44]. Models have included time in years as a categorical variable to evaluate the dynamics of dengue cases [Reference Restrepo, Baker and Clements50] and also non-linear temporal trends [Reference Vazquez-Prokopec31].
Analytical method
A variety of Bayesian spatial and spatio-temporal approaches were used in modelling DF. Most studies adopted a fully Bayesian model with a spatially structured random effect using a CAR prior structure to investigate the relationship between the risk of dengue and selected covariates [Reference Jaya36, Reference Kikuti38–Reference Costa, Donalisio and Silveira41, Reference Ferreira and Schmidt43, Reference Martínez-Bello, Lopez-Quilez and Alexander Torres46]. Spatial empirical Bayes smoothing was used for two studies to examine the spatial distribution of dengue [Reference Vargas42, Reference Hu51].
Generalised linear mixed models (GLMMs) with proper CAR spatial random effects were applied to develop disease maps, with dengue incidence data assumed to be Poisson [Reference Lekdee and Ingsrisawang52]. Temporal components were additionally incorporated, either as a temporal covariate [Reference Pepin44, Reference Restrepo, Baker and Clements50], or via a GLMM with spatial and temporal random effects and temporal covariates [Reference Lowe26, Reference Lowe27, Reference Lowe48]. Among the selected studies, only two studies used a GLMM with spatial, temporal and spatio-temporal random effects [Reference Wijayanti29, Reference Martínez-Bello, López-Quílez and Prieto47], while one included these components along with an additional temporal covariate [Reference Lowe28]. Other GLMM spatio-temporal random-effects models with incorporation of a temporal trend have also been developed [Reference Sani35, 37]. Two studies used a GLMM zero-inflated model [Reference Fernandes, Schmidt and Migon33, 53].
Alternative models included estimation of relative risk for the transmission of dengue disease based on discrete time and space via a susceptible–infective-recovered model for human populations; susceptible–infective model for mosquito populations (SIR-SI) [Reference Samat and Percy54], prediction of spread of DF using Bayesian maximum entropy (BME) [Reference Yu30, Reference Yu55, Reference Yu, Lee and Chien56], and spatio-temporal quasi-Poisson model based on a DLNM (distributed lag non-linear model) [Reference Chien and Yu49], STARM (spatial–temporal autologistic regression model) [Reference Astutik34], hierarchical model with adaptive natural cubic spline [Reference Johansson, Dominici and Glass32], a semi-parametric Bayesian STAR (structured additive regression) model [Reference Vazquez-Prokopec31] and a transmission model based on Ross–Macdonald theory [Reference Zhu45]. The analytical methods used across all included studies are summarised in Supplementary Table S3 and summary of the structure of the spatio-temporal models discussed in the reviewed paper can be seen in Table 2. These models are explained in more detail as follows.
Spatial models
Several spatial models have been developed and applied to DF, namely empirical Bayes approaches and fully Bayes GLMM with a spatial CAR prior.
Empirical Bayes approaches: An empirical Bayes method is an approximation to the fully Bayesian method. In an empirical Bayes approach, the prior parameters are estimated from the data, while in a fully Bayesian analysis, the prior distribution is completely specified before observing any data [Reference Kumar57].
Empirical Bayes spatial smoothing for dengue incidence data has been used to categorise the high-risk and low-risk areas in Queensland [Reference Hu51]. Local empirical Bayes was applied to investigate the relationship between the HI, dengue incidence and socio-demographic variables [Reference Vargas42]. The authors concluded that there is a positive correlation between HI and Bayesian dengue incidence rate. The highest dengue risk regions were situated in the areas which had the highest population densities and were close to the major highways.
GLMM with spatial random effects: A GLMM with spatial random effects has been applied in seven studies [Reference Jaya36, Reference Kikuti38–Reference Costa, Donalisio and Silveira41, Reference Ferreira and Schmidt43, Reference Martínez-Bello, Lopez-Quilez and Alexander Torres46]. The general model is formulated as follows:
where y i is the number of dengue cases in i = 1, …, I areas; e i and θ i are, respectively, the expected number of dengue cases in area i and the log relative risk of dengue; α is the overall level of relative risk; β = (β 1, β 2, …β p) represent the coefficient of the covariates; u i is a spatially structured random effect with CAR prior structure and v i is a spatially unstructured random effect with mean zero and variance $\sigma _v^2 $. All authors used an intrinsic Gaussian CAR (ICAR) prior and adopted a binary neighbourhood weighting. An ICAR model assumes the areas k and i are neighbours if both share a common border. This can be expressed as follows:
ω ki = 1 if k, i are adjacent, ω ki = 0 otherwise [Reference Riebler58, Reference Lawson59]. This prior is the most common Gaussian Markov random field [Reference Botella-Rocamora, López-Quílez and Martinez-Beneito60] and is an improper prior [Reference Lawson and MacNab61]. Allowing for spatial autocorrelation through this prior can improve model fit [Reference Kikuti38]. However, the choice of neighbourhood structure needs to be carefully considered as it could impact on the significance of some covariates [Reference Ferreira and Schmidt43]. One study examined two additional types of neighbourhood structure matrices, namely, weighted by the length of the boundary and by boundary and barriers [Reference Ferreira and Schmidt43].
Martínez-Bello et al. [Reference Martínez-Bello, Lopez-Quilez and Alexander Torres46] compared CAR BYM (Besag, York and Mollié) [Reference Besag, York and Mollié62] prior and Leroux CAR prior [Reference Leroux, Lei, Breslow, Halloran and Berry63] for spatially structured random effects for estimating relative risk of dengue. They found that the CAR BYM prior was better than the Leroux CAR prior.
Spatio-temporal models
GLMM over space and time with spatial random effects: A GLMM indexed by space and time and with spatial random effects has been proposed in one study [Reference Lekdee and Ingsrisawang52] to develop a disease map and identify any association between dengue incidence, rainfall and temperature. The proposed model is expressed as
where y ij are the number of dengue cases in area i = 1, …, I, and time j = 1, …, J; μ ij is mean cases, log(popi) is the offset representing the total population in each area. Rain and temp are the total rainfall and temperature, respectively, in each area and time, and u i are proper CAR spatial random effects. A proper CAR is a variant of the ICAR prior, with an additional term for spatial autocorrelation ρ in the conditional expectation [Reference Kandhasamy and Ghosh64], as follows:
If ρ = 1 then the model is the ICAR in equation (1).
GLMM with spatial random effects + temporal covariate: An alternative representation of a GLMM with spatial random effects and the inclusion of temporal covariates has been proposed in two studies [Reference Pepin44, Reference Restrepo, Baker and Clements50]. Restrepo et al. [Reference Restrepo, Baker and Clements50] found that the convolution model was the preferred model (this includes both u i and v i) over models containing only the uncorrelated termv i or the ICAR term u i, and precipitation was the most significant predictor of dengue risk.
Pepin et al. [Reference Pepin44] proposed a different GLMM formulation to assess the role of city-wide vector data in forecasting DF cases. In this model, they included the rate of cases in neighbourhood i at time j, mosquito density data, fixed scaling factors, lagged time for specific variables and different weighting functions between-neighbourhood effects to illustrate patterns of city-wide human movement which consists of economic value of the neighbourhood, population density and travel distance between neighbourhoods. Two scales of spatial disease data, that is, nearest-neighbourhood effects (local) and all between-neighbourhood effects (global) are compared to predict the association between mosquito density and human cases of dengue. Models that included global between-neighbourhood effects and two covariates (mosquito density and human cases of dengue) and their interaction were preferred.
GLMM with spatial and temporal random effects + temporal covariate: GLMMs with spatial and temporal random effects and a temporal covariate have been proposed in three studies [Reference Lowe26, Reference Lowe27, Reference Lowe48]. Lowe et al. [Reference Lowe26] compared a spatio-temporal GLM and a GLMM that includes random effects in the linear predictor and found that the latter model provided more accurate dengue predictions. In this model, the number of dengue cases y ij are assumed to be Poisson distributed with mean dengue count μ ij given by
where $u_i \sim {\rm CAR(}\sigma _u^2 {\rm )}$, $v_i \sim N{\rm (}0,\sigma _v^2 {\rm )}$, and φ j are the temporally autocorrelated random effects (j = 2, …, 12) with φ 1 = 0, and $\varphi _ j \sim N(\varphi _{j-1},\sigma _\varphi ^2 )$, j = 2, …, 12.
The variable climate factors x kij are: precipitation in the previous 1 and 2 months, temperature in the previous 1 and 2 months and Nino 3.4 in the previous 6 months. The variables w ki are: altitude and percentage of urban population.
Another spatio-temporal GLMM by Lowe et al. [Reference Lowe27] extended the model by Lowe et al. [Reference Lowe26] by adding more recent data and including log dengue standardised morbidity ratio in the previous 3 months (past dengue risk), spatially structured and unstructured random effects and a first-order autoregressive month effect. Here the DF counts y ij are assumed to have a negative binomial distribution to allow for overdispersion in observed dengue data. The authors compared this model with a simple model based on past dengue risk only. They found that the extended model improved dengue predictions.
Generalised linear and additive mixed models (GLMM/GAMM) were applied to measure the benefit of including climate function in the model [Reference Lowe48]. The response had a negative binomial distribution and the dengue relative risk models included a baseline model with season only, a seasonal–spatial model (inclusion of spatial structure and unstructured error), a seasonal–spatial climate-linear model (linear climate model) and a seasonal–spatial climate-non-linear model (non-linear climate model). The results showed that the model with linear and non-linear climatic functions explained 39% and 40%, respectively, of the variation in dengue relative risk. An additional 7% and 8% of the variation was explained by seasonal–spatial structure using linear and non-linear climatic functions, respectively.
GLMM with spatial, temporal and spatio-temporal random effects + temporal covariate: Lowe et al. [Reference Lowe28] also formulated another GLMM model using a negative binomial distribution for the dengue case counts to predict dengue epidemic in Brazil during the 2014 football tournament. This model has minor differences and extensions to their previous model [Reference Lowe27]: the inclusion of log dengue standardised morbidity ratio 4 months previously, a fixed effect for month, a random effect for month and the inclusion of a first-order autoregressive month effect for each zone. Their results showed that this model can forecast which cities have low, medium and high risk of dengue.
GLMM with spatial, temporal and spatio-temporal random effects: Bayesian spatial, temporal and spatio-temporal random-effects models have been used to determine factors that influence the risk of dengue in the Banyumas regency, Indonesia [Reference Wijayanti29]. Two models have been compared, namely a model with the inclusion of covariates and spatially structured random effects only and a model with the inclusion of covariates, spatially structured and unstructured random effects, temporally structured and unstructured random effects and spatio-temporal random effects. The number of DF cases was assumed to be Poisson distributed. Uninformative priors were used for all variables as previous data were not available for Indonesia. The most significant factors that influenced the risk of dengue were found to be employment type and economic status. Wijayanti et al. [Reference Wijayanti29] explored only the unstructured interaction effect model (type I). The type II–IV interaction effects in spatio-temporal models of relative risk were not explored, which are temporal interactions, spatial interactions and inseparable space–time interactions, respectively. Martínez-Bello et al. [Reference Martínez-Bello, López-Quílez and Prieto47] explored type I–IV interaction effects, finding that the best model had the inclusion of a fixed coefficient of lag-zero epidemiological periods Land Surface Temperature (LST) and type IV interaction effects.
GLMM with spatio-temporal random effects + temporal trend: Sani et al. [Reference Sani35] developed a spatio-temporal convolution model as an extension of the spatial convolution model introduced by Eckert et al. [Reference Eckert65] and used this to analyse the relationship between covariates (rainfall and population density) and dengue risk. The number of dengue cases y ij was assumed to be Poisson distributed and the relative risk μ ij given by:
where (α + δ i)j z is a temporal trend and
with n ij denoting the number of population at area i time j.
The authors found that both rainfall and population density affected the number of dengue cases.
This spatio-temporal convolution model has been extended to include the probability of incident risk $\Pr \lpar {I_{ij}} \rpar $ into the model to overcome a misidentification of dengue location [37]. The extended model is as follows:
where
This extension resulted in more accurate estimates when compared with the previous models [Reference Sani35, Reference Eckert65]. They also concluded that both rainfall and population density significantly affected the number of dengue cases.
GLMM zero-inflated Poisson spatio-temporal model: Zero-inflated spatio-temporal models that can be applied to both continuous and discrete data have been proposed [Reference Fernandes, Schmidt and Migon33]. When observations exhibit an excessive number of zero values, the zero-inflated model is often more appropriate. These have been applied to estimate the probability of the presence of unobserved dengue disease in region i and time j.
A Bayesian mixed zero-inflated Poisson spatio-temporal (BMZIP S-T) model [53] has also been constructed.
The BMZIP S-T model is expressed as
where μ ij = ϕ ij/(1 − ϕ ij) and is modelled as per equation (2).
A spatio-temporal quasi-Poisson model: A spatio-temporal quasi-Poisson model based on the DLNM approach has been proposed to identify the relationship between the non-linear delayed impact of meteorological variations and dengue risk in southern Taiwan and to predict dengue cases in the coming weeks [Reference Chien and Yu49]. The number of weekly DF cases y ij was assumed to have a Poisson distribution as follows.
where the vector β contains the coefficients of the indicator variable year, f(Time) is the time smoother described by a cubic spline; f(T, lag = 20) and f(R, lag = 20) are functions of temperature and rainfall with a maximum temporal lag of 20 weeks, respectively; f spac(d) is a spatial function which was modelled using the CAR prior structure, and the offset is the logarithm of average annual population data. The authors found that the most significant factors that influenced DF epidemics were the weekly minimum temperature and the maximum 24 h rainfall. When the minimum temperature rises, the dengue relative risk increases, particularly at a lagged period of 5–18 weeks.
Hierarchical model with adaptive natural cubic spline: Johansson et al. [Reference Johansson, Dominici and Glass32] proposed a model that includes population size N j, covariates at distributed lags l k and a natural cubic spline smoothing function of time s(j, λ), where λ denotes the degree of annual freedom and is set to λ = 2. The distributed lag model is used to evaluate the effect of weather on dengue spread in the next 6 months. For each area i, the number of monthly dengue cases at time j, y j, is assumed to be Poisson distributed as follows:
A two-level approach was used to compare β k from the area-specific models. At the first level, area-specific (i) parameter estimates $\hat{\beta} _i$ were assumed to be normally distributed:
Effect modifiers z 1, z 2, …, z Q were added to estimate α 0 (the average effects) and the effect modification α q:
The authors found a positive correlation between monthly variation in temperature and precipitation and monthly variation in the spread of dengue, and that correlation varies spatially.
BME method: BME is popular in the study of natural systems (physical, biological, social or cultural) and for attributes that are characterised by space–time dependence and multi-sourced uncertainty. Two major knowledge bases (KB) for the spatio-temporal modelling in the BME method are: (1) the general KB (G-KB) that may include scientific theories, theoretical space–time dependence models and epidemic models; and (2) the site-specific KB (S-KB) that includes hard data and soft data, often with a significant amount of uncertainty [Reference Kanevski66]. The BME method incorporates both knowledge bases [Reference Christakos67, Reference Christakos68].
A spatio-temporal model that is based on a stochastic BME method has been used to predict DF outbreaks based on space and time and to examine the association between DF incidence and selected climate variables in Southern Taiwan [Reference Yu30]. In the BME analysis, the spatio-temporal distribution of DF occurrences is mathematically represented by the spatio-temporal random field, X(p) or X i,j where i and j indicate the areas and time, respectively. DF incidence is assumed to be Poisson distributed as follows:
with DF mean $\bar{X}_{ij} = R_{ij}\lambda _{ij}$ and λ ij is a climate-driven space–time process modelled by the log-link Poisson regression
where β l, γ m, θ n, ρ o, ϕ p and φ q are regression coefficients for temperature, logarithm of rainfall, SOI, BI, maximum temperature and minimum temperature, respectively (for the weekly temporal lags between a and b, c and d, e and f, g and h, r and s, and t and u, respectively) and n ij is the population size. The authors conclude that climatic conditions significantly affect DF outbreaks. Yu et al. [Reference Yu55] extended their previous model by inclusion of a stochastic susceptible–infected–recovered (SIR) model, that is, BME-SIR to obtain online space–time predictions of DF transmission. This model considered stochastic differential equations, characterising both the spatio-temporal pattern of disease spread and the heteroscedastic variance pattern across space and time. The aim was to achieve online updates of SIR model parameters.
A SIR-SI model: A discrete space–time stochastic susceptible–infective–recovered for human populations; susceptible–infective for mosquito populations (SIR-SI) model has been developed to circumvent problems of relative risk estimation using standardised morbidity ratios and the Poison-γ model which does not allow for spatial correlation [Reference Samat and Percy54]. The SIR-SI model was defined as follows:
with a non-stochastic vector population as follows:
Here the superscripts (h) and (v) represent the human and mosquito populations, respectively. $S_{i,j}^{\lpar h \rpar }, \; I_{i,j}^{\lpar h \rpar } $ and $R_{i,j}^{\lpar h \rpar } $ are the total number of susceptible, infective and recovered humans in area i for time j, respectively; ${\rm \Im} _{i,j}^{\lpar h \rpar } $ and ${\rm \Re} _{i,j}^{\lpar h \rpar } $ are the number of newly infective and recovered humans; μ (h) is the weekly birth and death rates in the human population; γ (h) is the rate of weekly recoveries; b is weekly biting rate; m is the number of alternative hosts available; β (h) is the probability of transmission from mosquito to human, and β (v) is the converse; and $N_i^{\lpar h \rpar } $ is the number of humans in area i.
The number of new infections is assumed to follow a Poisson distribution with mean $\lambda _{i,j}^{\lpar h \rpar } $, intercept $\beta _0^{\lpar h \rpar } $ and spatial random effect $c_i^{\lpar h \rpar } $ using a CAR prior. Models were applied to all of Malaysia divided into 16 states. The results showed that the proposed SIR-SI model that considers the inclusion of the transmission process of dengue disease, covariates and spatial correlation was preferred over unmodelled SMRs or the Poisson-γ model. The authors also identified areas with very high and high dengue risk.
STARM: The STARM model is an extension of an autologistic regression model that includes covariates, spatial and temporal dependence simultaneously. This model has been applied to predict the association between the incidence of endemic dengue and rainfall using a Bayesian method [Reference Astutik34]. For binary data that are measured repeatedly on a spatial lattice, STARM can be very beneficial [Reference Zheng and Zhu69]. The incident rate (IR) is converted to the binary scale as a representation of the A. aegypti spread, that is, 1 if there is endemic dengue (IR > 10/100 000 population) and 0 if there is no endemic dengue. Endemic level and rainfall are dependent and independent variables, respectively. The STARM model may be defined as follows:
where Y i,j is dengue endemic at the ith region and the jth time, X 1,i is rainfall index at the ith region, N i,j is neighbourhood structure. θ 0, θ 1, θ 2, θ 3 are an intercept and coefficients for rainfall, spatial autoregression and temporal autoregression, respectively. The authors use the inverse Gaussian as a prior distribution for each of θ 0, θ 1, θ 2, θ 3. The result showed that there is a positive correlation between the endemic level of DHF incidence and rainfall.
A semi-parametric Bayesian spatio-temporal geoadditive STAR model: A spatio-temporal geoadditive STAR model has been used to evaluate the impact of indoor residual spraying and spatial autocorrelation in the odds of dengue infection [Reference Vazquez-Prokopec31]. A predictor structure for the spatio-temporal geoadditive model is given as follows:
where η ij, x ij1, …, x ijk are predictor and covariate values for individual i at time j. The fixed effects of non-linear function of covariates (f 1, …, f k) and non-linear time trend f time were modelled by independent diffuse priors using Bayesian penalised splines. f spat is a spatially structured random effect of the location s ij using Markov random field priors and ${u}^{\prime}_{ij}\gamma $ are linear predictors for the covariate vector u.
This STAR model assessed the impact of rain, spray cumulative proportion (cum_spr) and spatial correlation (spat) on the odds of dengue virus infection, where the probability of infection followed a binomial distribution (0 if there is no infection, 1 if there is an infection) at house level (1490 premises) as follows:
The authors compared two STAR models, that is, a model with and without rain as a fixed effect. Interestingly the results showed that a model without a rain covariate was better able to describe the spatial pattern of dengue infection. The authors concluded that there was a significant positive correlation between the number of indoor residual spraying applications up to a time lag of 2 weeks and the weekly number of cases.
Transmission model based on Ross–Macdonald theory: A dengue transmission model based on the Ross–Macdonald theory has been proposed to identify the pattern of dengue transmission in space and time. This model incorporates four essential sub-models, that is, female mosquito density dynamics, human daily movement, virus transmission and estimation of parameters [Reference Zhu45] that can be explained as follows.
The correspondence between reported incidence and modelling cases is given as follows:
where δ is reported incidence rate, ρ j = (ρ 1j, ρ 2j, …, ρ Ij)J is a vector of the estimated number of incidences at each time period j, ε j is an error term and Γj = (γ 1j, γ 2j, …, γ Ij)J is the vector space–time surveillance data at time j.
Female mosquito density $x_i^k \lpar j \rpar $ with age k at time j in district i can be calculated as
where K is the proportionality coefficient between the vector density and BI, B i(j) is the value of BI at time j in district i, p(k) is the daily survival probability of adult mosquitoes at age k. To estimate model parameter K, MCMC methods were used.
Human daily commuting into different areas, which is defined as those who work or study in different districts and who go out in the morning and return in the afternoon, is assumed to impact on dengue transmission as follows:
where T il is the number of travellers leaving from district i to l; N i is the population in district i; S il is the total number of residents in the circle whose centre is the origin district i and radius is the distance between district i and the destination district l, minus the population at i and l. T i is the total number of travellers leaving from district i which is defined as T i = N i(N c/N), where N c and N are the total number of travellers and the total population, respectively.
In virus transmission modelling, vectorial capacity, which is defined as the mean of infectious mosquito bites per unit time, is used to evaluate the infectivity from mosquitoes and is calculated as
where $V_i^k $, $m_i^k $ represent vectorial capacity contributed by mosquitoes with age k in district i, and the ratio of mosquitoes at age k to humans, respectively. Here a i and e k+q represent Aedes mosquito biting rate of humans in district i within 12 h, and the expectation of remaining infectious life at age k + q, respectively.
The authors concluded that the space–time distribution of incidence is highly heterogeneous, with 81.6% of transmission occurring in urban centres in Guangzhou, China with a peak in mid-October. They also found that there is inconsistency between infected cases and reported cases in space–time. Vector indices and human mobility factors significantly affect the dengue transmission patterns. Urban areas had the highest incidence rates and suburban areas had the second highest incidence rates.
Assessment of quality
Using the adapted tool for assessment of modelling study quality, quality scores for the reviewed paper ranged from 7 to 16 out of 16 (Table 3). One study was classified as low quality, three as medium quality, 10 as high quality and 17 as very high quality. The median score was 14/16, which is categorised as high quality. Details on the quality of data were lacking in many papers.
AaO, aims and objectives; SaP, setting and population; MS, model structure; MM, modelling methods; PRDS, parameter ranges and data sources; QoD, quality of data; PoR, presentation of results; IDoR, interpretation and discussion of results; FS, final score.
Discussion
Covariates
Climatic variables were dominant among covariates used by studies to predict the DF outbreaks based on place and time. Precipitation and temperature were the most common and most commonly significant predictors [Reference Johansson, Dominici and Glass32, Reference Chien and Yu49, Reference Lekdee and Ingsrisawang52]. Most studies found a positive significant association with precipitation [Reference Astutik34, Reference Sani35, 37, Reference Restrepo, Baker and Clements50], although one study found a negative correlation with precipitation 4–6 months previously [Reference Lowe48]. These more complex associations depend on local seasonal patterns.
Population density was the most common demographic factor used in modelling DF. Study results were generally consistent in showing that population density was positively significantly correlated with an increase in dengue cases [Reference Lowe28, Reference Sani35, 37, Reference Vargas42]. Furthermore, urban areas had higher incidence rates than suburban areas [Reference Zhu45]. Kikuti et al. [Reference Kikuti38] found that population density and the percentage of population aged under 15 years were correlated with detection of dengue. Human daily mobility as an indicator of demography, referring to commuting, was included for only one study [Reference Zhu45]. Human movement significantly affects the spatial spread of infectious disease like dengue [Reference Paz-Soldan70, Reference Steven71]. Therefore, it is important to incorporate the variety of human movements in modelling dengue transmission.
The most common socio-economic variable was educational level, followed by income and garbage collection. The significance of socio-economic factors differ by regions, but dengue is often more common among those of lower socio-economic status. For example, in Brazil, inadequate garbage disposal and income were the most significant factors related to the incidence of dengue [Reference Honorato39] and lower socio-economic status (within a slum society) increased the risk of dengue [Reference Kikuti38]. In Indonesia, the most significant factors that influenced the risk of dengue were employment type and education level [Reference Wijayanti29].
Incorporation of entomological data in modelling DF to determine their impact on DF relative risks has been used by only six studies. Some studies have found that the most significant effect on the relative risk of DF is a larva-free home [Reference Jaya36] and there was a significant positive correlation between the number of indoor residual spraying applications up to a time lag of 2 weeks and the weekly number of cases [Reference Vazquez-Prokopec31]. Without mosquitoes, dengue cannot be transmitted.
Geographical data were used in nine studies reviewed. The most common indicator was altitude [Reference Lowe26–Reference Lowe28] and mean vegetation index [Reference Wijayanti29, Reference Martínez-Bello, Lopez-Quilez and Alexander Torres46, Reference Martínez-Bello, López-Quílez and Prieto47]. Only one study [Reference Kikuti38] used three indicators, namely the mean elevation from sea level, elevation range and distance from the CTs to the public health unit. The other studies used only one indicator. Lowe et al. [Reference Lowe28] found that altitude was significantly negatively correlated with relative risk of dengue. Kikuti et al. [Reference Kikuti38] showed that residential adjacency to the health unit was most significantly correlated with dengue cases detection and the spatial distribution of dengue cases detection was heterogeneous. Therefore, it is important to consider neighbourhood features when evaluating DF risk.
This review has also shown that almost all the reviewed papers that include a lag time use climate data [Reference Lowe26–Reference Lowe28, Reference Yu30, Reference Johansson, Dominici and Glass32, Reference Pepin44, Reference Lowe48, Reference Chien and Yu49] and only a few papers include entomological data [Reference Yu30, Reference Pepin44]. Various temporal lags and climate variables have been used to find the most significant combination of temporal lags in describing the relative risk of dengue and predicting DF outbreak. For example, in Thailand, temperature and precipitation were significant dengue predictors with a time lag of 1 month preceding, but precipitation 4–6 months preceding was negatively correlated with dengue relative risk [Reference Lowe48]. Three studies reported on slightly different covariates in Brazil. The first of these used temperature and precipitation with time lags of 3 months [Reference Lowe28]. The second used an additional covariate ONI with time lags of 4 months [Reference Lowe27]. The third used SOI with time lags of 6 months but precipitation and temperature were 1 and 2 months previously [Reference Lowe26]. Lowe et al. [Reference Lowe26] have highlighted that SOI significantly affects the time signal of dengue prediction. In southern Taiwan, it was found that the relative risk of DF increased when weekly minimum temperature increased with time lag over 4 weeks [Reference Chien and Yu49]. Therefore, in order to predict DF outbreak more precisely, a variety of temporal lags of climate variables and other covariates such as entomological data should be considered.
Modelling approaches
Fully Bayesian methods are becoming more common as an alternative to the frequentist methods for spatial analysis of diseases. The benefit of Bayesian methods is that they can reduce the estimated variance particularly for regions with small populations [Reference Honorato39]. Moreover, with Bayesian approaches, it is easier to incorporate variance components in a hierarchical manner and hence better estimate predictive uncertainty compared with frequentist methods based on maximum likelihood [Reference Lowe26].
GLMMs also play a significant role in modelling spatial and spatio-temporal DF patterns. The inclusion of unstructured random effects in the model can account for overdispersion in dengue count distributions and allow for unknown factors. However, unstructured random effects are not able to overcome spatial dependence between locations. One way to allow for correlated heterogeneity between locations is the inclusion of spatially structured random effects [Reference Lowe26]. Where GLMMs with spatial random effects have been applied, most studies have modelled the spatial random effect using an ICAR prior and adopted a binary adjacency-based neighbourhood spatial weight matrices. An additional two studies used a proper CAR prior [Reference Lekdee and Ingsrisawang52, Reference Samat and Percy54]. However, specific areas like rural areas or areas without neighbours need to be investigated in order to enhance the correlation structure in the model. For example, distance-based weight matrices, may be preferable for investigating the effect of road travel or human mobility. Only one study has used different types of neighbourhood adjacency matrices, namely, binary, weighted by the length of the boundary and by boundary and barriers [Reference Ferreira and Schmidt43]. Since the significance of some covariates change with the use of different adjacency structures, these different types of neighbourhood structures need to be taken into consideration. The inclusion of the spatially structured component using a CAR prior often improves model fit [Reference Kikuti38]. However, the impact of using other smoothing priors has not been done and needs to be further investigated.
Some studies have included both spatial and temporal random effects in modelling DF [Reference Lowe26, Reference Lowe27, Reference Lowe48]. These authors assigned an intrinsic CAR prior to the spatially structured random effects and the first order autoregressive (AR (1)) prior for temporally structured random effects. An AR (1) model assumes that the current value is only affected by the prior value of the previous time period or temporal stage. Carroll et al. [Reference Carroll72] considered temporal structure with a CAR prior distribution which allows more flexible structured variation over time; this model was applied to melanoma data. To increase the model fit, other time-series components that describe seasonal patterns, moving averages, trends, first- or second-order random walks should be considered.
Among the selected studies, only one study used a GLMM with spatial, temporal and spatio-temporal random effects [Reference Wijayanti29], and only one with an additional temporal covariate [Reference Lowe28]. Wijayanti et al. [Reference Wijayanti29] assumed an interaction of two spatially unstructured and temporally unstructured random effects, which means there was no spatial and temporal structure on the interaction. More reasonable space–time interactions can be considered and compared. For example, the combination of spatially unstructured and temporally structured components may be more appropriate when the time trends differ between areas but the spatial structure is similar. Other types of space–time interactions that can be considered are a combination of spatially structured and temporally unstructured [Reference Costa, Donalisio and Silveira41], or a combination of spatially and temporally structured components [Reference Knorr-Held73]. Lowe et al. [Reference Lowe27] applied a first-order autoregressive monthly effect for each region in the space–time interaction, which means that the temporal trends are different for each region point and only depend on one previous time lag, without any spatial structure. The limitation of this interaction model is that neighbours in space are not considered. By working with adjacency matrices, neighbourhoods can be defined for both time and space and incorporated in spatio-temporal autoregressive (STAR) models [Reference Banerjee and Carlin74].
Transmission models have been proposed by two studies [Reference Zhu45, Reference Samat and Percy54]. An SIR-SI included a transmission model for both the human and mosquito populations [Reference Samat and Percy54]. However, the mosquito population was assumed to be non-stochastic. Therefore, there needs to be better integration of transmission (mechanistic) and stochastic models. Other transmission models have been proposed by integrating four sub-models based on the Ross–Macdonald theory. This has been applied to a dengue study in China. However, all of the parameters for the independent variables were assumed to have normal distributions. Other prior distributions could be considered in order to detect patterns of dengue transmission more precisely.
Study limitations
Only studies published in English were considered. It is acknowledged that there are other papers relevant to Bayesian spatial modelling in other languages, so we may have excluded valuable contributions. Furthermore, modelling dengue virus itself was not considered in this review, despite the acknowledged correlation between dengue virus and the number of dengue cases. Finally, inconsistencies in categorisation of some covariate variables, for example, indicators of socio-economic status, socio-demographics and environmental variables, have been found in a number of studies. These inconsistencies were not pursued here.
Conclusions
Various Bayesian modelling approaches that aim to relate a range of possible explanatory variables with DF incidence or risk have been reviewed. Bayesian approaches are recommended instead of frequentist methods as they allow incorporation of a wider range of components of variance at different levels in the model and it is easier to obtain a more complete assessment of prediction uncertainty. Temperature and precipitation were important determinants of the relative risk of DF and predicting DF outbreak.
Most models used GLMM spatial random effects with spatially correlated effects using a CAR prior. Other GLMM models with the inclusion of temporal covariates and temporal trends were used to predict dengue risk. The only model that incorporated four sub-models, that is, female mosquito density dynamics, human daily movement, virus transmission and estimation of parameters was a transmission model based on Ross–Macdonald theory. To improve the precision of model fit of DF, different types of neighbourhood structure, proper and improper priors in spatial random effects, temporally structured effects and types of space–time interaction should be considered.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0950268818002807
Acknowledgements
The authors would like to thank Professor Wenbiao Hu for his valuable suggestions.
Financial support
This work was partially funded by the Australian Research Council Centre of Excellence for Mathematical and Statistical Frontiers (ACEMS).
Conflict of interest
None.