 Research Article
 Open Access
 Published:
Estimating the basic reproduction number for singlestrain dengue fever epidemics
Infectious Diseases of Poverty volume 3, Article number: 12 (2014)
Abstract
Background
Dengue, an infectious tropical disease, has recently emerged as one of the most important mosquitoborne viral diseases in the world. We perform a retrospective analysis of the 2011 dengue fever epidemic in Pakistan in order to assess the transmissibility of the disease. We obtain estimates of the basic reproduction number R_{0} from epidemic data using different methodologies applied to different epidemic models in order to evaluate the robustness of our estimate.
Results
We first estimate model parameters by fitting a deterministic ODE vectorhost model for the transmission dynamics of singlestrain dengue to the epidemic data, using both a basic ordinary least squares (OLS) as well as a generalized least squares (GLS) scheme. Moreover, we perform the same analysis for a directtransmission ODE model, thereby allowing us to compare our results across different models. In addition, we formulate a directtransmission stochastic model for the transmission dynamics of dengue and obtain parameter estimates for the stochastic model using Markov chain Monte Carlo (MCMC) methods. In each of the cases we have considered, the estimate for the basic reproduction number R_{0} is initially greater than unity leading to an epidemic outbreak. However, control measures implemented several weeks after the initial outbreak successfully reduce R_{0} to less than unity, thus resulting in disease elimination. Furthermore, it is observed that there is strong agreement in our estimates for the precontrol value of R_{0}, both across different methodologies as well across different models. However, there are also significant differences between our estimates for the postcontrol value of the basic reproduction number across the two different models.
Conclusion
In conclusion, we have obtained robust estimates for the value of the basic reproduction number R_{0} associated with the 2011 dengue fever epidemic before the implementation of public health control measures. Furthermore, we have shown that there is close agreement between our estimates for the postcontrol value of R_{0} across the different methodologies. Nevertheless, there are also significant differences between the estimates for the postcontrol value of R_{0} across the two different models.
Multilingual abstracts
Please see Additional file 1 for translations of the abstract into the six official working languages of the United Nations.
Background
Global incidence of dengue has seen a striking increase over the past few decades [1, 2]. The infectious disease is now endemic in more than a hundred tropical and subtropical countries worldwide [1–3]. With an estimated 50–100 million cases and nearly 10,000 20,000 deaths annually, dengue ranks second to Malaria amongst deadly mosquitoborne diseases [1, 2, 4–6]. The disease is caused by one of four virus serotypes (strains) of the genus Flavivirus[2, 3, 7]. Most infected individuals suffer from dengue fever, a severe flulike illness characterized by high fever, which is not usually a threat to mortality [2, 8]. The symptoms of the disease last one to two weeks, after an initial incubation period of about 4–7 days [9]. Some infected individuals however, develop dengue hemorrhagic fever (DHF) resulting in bleeding, low levels of blood platelets and blood plasma leakage, or dengue shock syndrome (DSS) resulting in extremely low blood pressures. The risk associated with DHF and DSS is considerably higher, with mortality ranging from 5–15% [3, 5, 9, 10].
There is evidence of dengue epidemics occurring in North America, Asia and Africa in the late 18^{th} century [3]. Up until the middle of the 20^{th} century however, incidences of dengue fever have been rare [3]. Nonetheless, since the 1970’s, there has been a marked increase in the number of dengue cases, as well as the frequency and severity dengue epidemics, with the WHO claiming a 30fold increase in the incidence of dengue between 1960 and 2010 [2, 3, 8]. Factors such as population growth, rapid urbanization and increase international travel are often cited as having contributed to this dramatic increase [8]. Dengue is currently endemic in nearly 110 countries in Southeast Asia, the Americas, Africa and the Eastern Mediterranean [2]. The WHO estimates that nearly 2.5 billion people are at risk of contracting the disease. Furthermore, nearly 50–100 million cases and almost 20,000 deaths due to more severe forms of dengue fever are reported globally every year, making dengue one of the deadliest mosquitotransmitted diseases [1, 2, 4–6].
Dengue is transmitted to humans through mosquito bites. Female mosquitos of the Aedas genus, primarily Aedes aegypti, acquire the dengue virus through a blood meal from infected humans [2, 11]. The dengue virus has an incubation period of about 7–10 days in the vector, and is then spread to susceptible humans who are bitten by the infected mosquito [9]. The virus also has an incubation period of 4–7 days in the host [9]. While vectors never recover from infection with the dengue virus, the infection in hosts lasts only about one to two weeks [2]. Hosts that recover from infection with one serotype of the dengue virus gain lifelong immunity from that serotype but only temporary and partial immunity to other serotypes [2, 4, 12–14]. This partial crossimmunity is the cause of antibodydependent enhancement (ADE) in the setting of a secondary infection with a different serotype of DENV (Dengue Virus). ADE is hypothesized to be one factors leading to DHF and DSS, the more severe form of dengue disease [4, 12, 13, 15]. In this study however, we will consider infection involving only a single serotype of the dengue virus.
About 80% of individuals suffering from a primary infection with DENV are asymptomatic or display only a mild, uncomplicated fever [2, 8]. A much smaller proportion of infected individuals suffer from dengue hemorrhagic fever and dengue shock syndrome [3, 5, 9]. As mentioned previously however, risk of DHF and DSS is associated primarily with secondary infection with a heterologous serotype of DENV [4, 12, 13, 15]. In general, the course of infection of dengue can be divided into three separate phases: febrile, critical and recovery. The febrile phase, which is rarely life threatening, is marked by the sudden onset of high fever, rash, headaches and muscle and joint pains, which lead to the alternative name "breakbone fever" for dengue disease [8]. While most individuals then progress to the recovery phase, a small fraction of infected individuals instead progress to the critical phase of the disease. This phase lasts for one or two days and is marked by low blood pressure, leakage of blood plasma from the capillaries and decreased blood supply to organs. Severe cases of these symptoms are associated with DHF and DSS and the mortality in this phase of the disease is estimated to be as high as 5–15% [3, 5, 8, 9].
Over the past several years, a number of deterministic mathematical models have been proposed to analyze the transmission dynamics of dengue in urban communities [5, 11–17]. L. Esteva and C. Vargas [14] have investigated the coexistence of two serotypes of dengue virus using a deterministic ODE model. Moreover, Ferguson et al. [15] have investigated the effects of ADE on the transmission of multiple serotypes of dengue virus. In addition, Garba et al. [11] have shown the existence of a backward bifurcation in a standard incidence ODE model for a single strain of dengue virus. Garba et al. [12] have also explored the effects of crossimmunity on the transmission dynamics of two strains of dengue virus. Similarly, H. Wearing and P. Rohani [13] have investigated the effects of both ADE and cross immunity on multiple serotypes of dengue virus. Finally, Chowell et al. [18] have estimated the basic reproduction number for dengue using spatial epidemic data.
In addition, over the past few decades, several stochastic epidemic models for the spread of infectious diseases have also been proposed and analyzed [19–27]. An important qualitative difference between deterministic and stochastic epidemic models in general is the asymptotic dynamics [28]. Furthermore, stochastic models also allow for the possibility of disease extinction in finite time and therefore the expected time to disease extinction can be calculated [19, 28, 29]. It is also observed that stochastic models better capture the uncertainty and variability that is inherent in reallife epidemics due to factors such as the unpredictability of persontoperson contact [27, 29]. L. J. S. Allen [28, 29] has explored the utility of stochastic epidemic models by comparing them with deterministic models. Despite, the utility of stochastic models, however, very little stochastic modeling has been performed for the transmission dynamics of dengue virus (see [26] and the references therein).
The purpose of this study is to estimate the transmissibility of the dengue virus during the 2011 dengue fever epidemic in Pakistan using epidemic data in the form of the cumulative number of reported cases of dengue. We will employ three different techniques, applied to two different models and compare the results across both the different statistical inference methodologies as well as the different models. The first approach, based on the earlier work of CintronArias et al. [30], will involve fitting a deterministic epidemic model for the transmission of dengue to the epidemic data using an ordinary least squares (OLS) scheme implemented using the builtin optimization toolbox in MATLAB and applied in the context of an appropriate statistical model. The second method, also based on the recent work of CintronArias et al. [30], will use a generalized least squares (GLS) scheme to fit the same deterministic model to the epidemic data. Furthermore, both approaches will also be applied to a different directtransmission model for the transmission dynamics of dengue. Finally, the third approach will involve the formulation of a direct transmission stochastic epidemic model for dengue. We shall then use Markov chain Monte Carlo techniques to obtain a probability distribution for the model parameters.
A simple but effective measure of the transmissibility of an infectious disease is given by the basic reproduction number R_{0}, defined as the total number of secondary infections produced by introducing a single infective in a completely susceptible population [31]. For vectorborne diseases such as malaria and dengue, R_{0} is the number of secondary cases produced by a single infectious vector introduced in a completely susceptible host and vector population. In general, for simple epidemic models, if R_{0} is greater than unity, an epidemic will occur while if R_{0} is less than unity, an outbreak will most likely not occur. Thus, the value of R_{0} can be used to determine the intensity of control measures that need to be implemented in order to contain the epidemic.
The estimation of the basic reproductive number is generally an indirect process because the model parameters that R_{0} depends on are difficult or impossible to determine directly. The general methodology used therefore, attempts to fit an epidemic model to available epidemic data in order to estimate the model parameters. These parameters are then used to estimate the basic reproduction number R_{0}. The current study is based on this methodology.
Results
Applying the algorithm for the ordinary least squares (OLS) methodology to the vectorhost model (1.1) results in a value of C_{ HV }= 8.1897 week ^{}1 before the implementation of control measures and a value of C_{ HV }= 0.9523 week ^{}1 after the implementation of control measures. Thus, we obtain an estimate of R_{0} = 2.9871 before the implementation of control measures and R_{0} = 0.3473 after the implementation of control measures. The bestfit trajectory of model (1.1) calculated using the OLS methodology, along with the epidemic data is displayed in Figure 1. Similarly, implementing the algorithm for the generalized least squares (GLS) methodology results in a value of C_{ HV }= 8.0976 week ^{1} before the implementation of control measures and a value of C_{ HV }= 1.2374 week ^{1} after the implementation of control measures. Thus, when using the GLS scheme, we obtain an estimate of R_{0} = 2.9535 before the implementation of control measures and R_{0} = 0.4513 after the implementation of control measures. Figure 2 displays the bestfit trajectory of model (1.1) calculated using the GLS methodology. We observe that the GLS estimates for R_{0} before the implementation of control measures are in close agreement with the results obtained using the OLS scheme. There is however, difference between the estimates for R_{0} after the implementation of control measures.
Application of the algorithm for the ordinary least squares (OLS) methodology to the directtransmission model (3.1) results in a value of β = 3.0302 week ^{1} before the implementation of control measures and a value of β=0.6622 week ^{1} after the implementation of control measures. This corresponds to a value of R_{0} = 3.0182 before the implementation of control measures and R_{0} = 0.6596 after the implementation of control measures. Figure 3 displays the bestfit trajectory of the directtransmission model (3.1) calculated using the OLS scheme. We observe that the precontrol value of R_{0} is slightly larger (1.04%) than the corresponding value obtained by applying the OLS scheme to the vectorhost model (1.1), which we feel is not a statistically significant increase. However, the postcontrol value of R_{0} is significantly higher (89.92%) than the corresponding value obtained by applying the OLS scheme to the vectorhost model (1.1).
In addition, precontrol estimates of R_{0} for the dengue epidemic from the uncertainty analysis are shown in Figure 4. The 95% confidence interval for the precontrol value of R_{0} is given by (2.0293,6.5310). The probability that R_{0}, before the implementation of control measures, is greater than unity is more than 99.9%. Similarly, the sensitivity analysis, shown in Figure 5, suggests that the most significant (PRCC values above 0.5 or below 0.5) sensitivity parameters to R_{0} are τ_{ H }, σ_{ V }and μ_{ V }. This suggests that these parameters need to be estimated with precision in order to accurately capture the transmission dynamics of dengue.
Furthermore, implementing the algorithm for the generalized least squares methodology in the case of the directtransmission model (3.1) results in a value of β = 3.0920 week ^{1} before the implementation of control measures and a value of β = 0.5895 week ^{1} after the implementation of control measures. These values of the contact rate result in an estimate of R_{0} = 3.0797 before the implementation of control measures and R_{0} = 0.5872 after the implementation of control measures. The bestfit trajectory of the directtransmission model (3.1) calculated using the GLS scheme is displayed in Figure 6. We observe that while these results are in close agreement with the corresponding results obtained by applying the OLS scheme to the directtransmission SEIR model, the estimated value of postcontrol R_{0} is again, significantly larger (30.11%) than the corresponding value obtained by applying the GLS scheme to the vectorhost model. However, there is only a slight increase (4.27%) in the estimate for the precontrol value of R_{0} across the different models, which we again feel is not a statistically significant increase.
Finally, implementation of the MCMC algorithm to the stochastic directtransmission model (4.1) results in a value of β = 3.0650 week ^{1} before the implementation of control measures and a value of β = 0.6318 week ^{1} after the implementation of control measures. These values of the contact rate correspond to values of R_{0} = 3.0528 before the implementation of control measures and R_{0} = 0.6293 after the implementation of control measures. The probability distributions of the contact rate β, both before and after the implementation of control measures are displayed in Figure 7 and Figure 8 respectively. Moreover, the mean trajectory of the stochastic model (4.1) calculated using Monte Carlo simulations involving the mean of the posterior distribution is displayed in Figure 9. The results of the MCMC methodology are in close agreement with the results obtained by applying the GLS and OLS schemes to the deterministic directtransmission model (3.1). Furthermore, there is also close agreement in our estimates for the precontrol value of R_{0} across both the previous methodologies as well as the different models. There is however, again, a significant difference between the postcontrol value of R_{0}, obtained using the MCMC algorithm and the estimates of the postcontrol value of R_{0}, obtained by applying the OLS and GLS scheme to the vectorhost model (1.1).
Discussion
We have performed a retrospective analysis of the 2011 dengue fever epidemic in Pakistan and obtained estimates of the basic reproduction number R_{0}, from epidemic data using three different methods. R_{0}, defined as the total number of secondary infections produced by introducing a single infective in a completely susceptible population, is a simple but effective measure of the transmissibility of an infectious disease. In each case it was observed that the value of R_{0} was initially well in excess of unity, leading to the observed epidemic outbreak. Some weeks after the initial outbreak however, control measures were successfully implemented that reduced the value of R_{0} to less than unity, thus resulting in disease elimination.
Several methods have been proposed for the estimation of R_{0}, both for deterministic as well as for stochastic models. These methods depend upon the mathematical model of the disease as well the nature of the data. In the case of deterministic compartmental models, least squares fit to the data has been widely used to estimate the model parameters [18, 30, 32]. For stochastic models likelihood based techniques have been used by several authors [33, 34]. We consider two ODE based deterministic models and a Continuous Time Markov Chain (CTMC) based stochastic model. Using least squares estimation for the deterministic models and a Markov Chain Monte Carlo based approach for the stochastic model, we compare the value of R_{0} obtained using the different models. We note that this is the first such study performed for the Dengue Epidemic in Pakistan.
The first inference methodology involved fitting a deterministic ODE model for the transmission dynamics of singlestrain dengue to the epidemic data using a basic ordinary least squares (OLS) scheme in the context of a statistical model which assumed longitudinally constant variance for the epidemic data. An a priori more realistic methodology was used to fit the deterministic ODE model to obtain estimates of the model parameters using a generalized least squares (GLS) scheme which made use of a statistical model that assumed that variances associated with the observation process were directly proportional to the measurement values. One of the questions we tried to address was whether or not the variances were strongly dependent on the observations. Finally, we formulated a discretetime, direct transmission, stochastic model for the spread of dengue virus and used Markov chain Monte Carlo (MCMC) methods to perform Bayesian inference and estimate the basic reproduction number.
We observe that the estimates for the basic reproduction number R_{0} before the implementation of control measures are in excellent agreement for the same model across different methodologies. Similarly, across different models there is a very slight but nonetheless statistically insignificant difference in our estimates of the precontrol basic reproduction number. We therefore conclude that our estimation of the precontrol value of R_{0} is quite robust, both across different methodologies as well as across different models. This leads us to believe that the noise does not depend significantly on the data. Furthermore, agreement in our estimates across models indicates that both the vectorhost model as well as the directtransmission model can be used to accurately capture the disease dynamics of actual dengue epidemics before the implementation of control measures.
While there is also close agreement in our estimates for the basic reproduction number R_{0} after the implementation of control measures across different methodologies, there is nonetheless significant difference between the postcontrol estimates of R_{0} across the vectorhost and directtransmission models. Specifically, R_{0} is estimated to be significantly larger in value when using the directtransmission model as opposed to the vectorhost model. We conjecture that this might be due to the fact that the directtransmission model makes use of an approximation, which involves solving for the equilibrium value of the vector force of infection. Thus, while the vector force of infection rises and peaks for the vectorhost model, before settling to its equilibrium value, it is in effect equal to the smaller equilibrium value for the directtransmission model. Therefore, since the vector force of infection for the direct transmission model is, in effect, smaller for the time period after the implementation of control measures, it results in a larger estimate of the basic reproduction number in order to produce a ‘bestfit’ for the observed epidemic data.
Conclusion
In conclusion, we have attempted to assess the transmissibility of the dengue virus during the 2011 dengue fever epidemic in Pakistan by estimating the basic reproduction number R_{0} both before and after the implementation of public health control measures. Our estimates for the precontrol value of R_{0} are in close agreement both across different methodologies and the different models. Furthermore, the postcontrol estimates are also in close agreement across the different methodologies. There is however, a significant increase in the estimates of the postcontrol value of R_{0} obtained using the directtransmission model compared to estimates obtained using the vectorhost model.
Methods
Methods and materials for statistical inference using the vectorhost model
The vectorhost epidemic Model
The model is a deterministic vectorhost ODE model that assumes a homogeneous mixing of the host (human) and vector (mosquito) populations. The total human population at time t, denoted by N(t), is divided into four mutually exclusive classes comprising of susceptible humans S_{ H }(t), exposed humans E_{ H }(t), infected humans I_{ H }(t) and recovered humans R_{ H }(t). It is assumed that individuals who recover from infection with a particular serotype of Dengue gain lifelong immunity to it [2]. Similarly, the total vector population at time t is denoted by N_{ V }(t) and is divided into three mutually exclusive classes comprising susceptible of susceptible vectors S_{ V }(t), exposed vectors E_{ V }(t) and infected vectors I_{ V }(t). It is assumed that vectors (mosquitoes) infected with a particular serotype of Dengue never recover [2]. We also modify the original model of Garba et al. [11] by assuming that exposed humans and exposed vectors do not transmit the disease.
The model assumes that the susceptible human population S_{ H }(t) has a constant recruitment rate Π_{ H }and natural death rate μ. Susceptible individuals are infected with Dengue virus (due to contact with infected vectors) at a rate λ_{ H }and thus enter the exposed class E_{ H }. The exposed population E_{ H }(t) is depleted at the natural death rate μ. Additionally, exposed individuals develop symptoms and move into the infected class I_{ H }at a rate σ_{ H }. The infected population I_{ H }(t) is depleted via the natural death rate μ, the diseaseinduced death rate δ_{ H }and the recovery rate of infected individuals τ_{ H }. Finally, the recovered population R_{ H }(t) decreases due to the natural death rate μ.
Similarly, the susceptible vector population S_{ V }(t) has a constant recruitment rate Π_{ V }and a natural death rate μ_{ V }. Susceptible vectors are infected with Dengue virus (due to effective contact with infected humans) at a rate λ_{ V }and thus move to the exposed vector class E_{ V }. The exposed vector class E_{ V }(t) is depleted at the natural death rate μ_{ V }. In addition exposed vectors develop symptoms and move to the infected vector class I_{ V }at a rate σ_{ V }. Infected vectors, in addition to the natural death rate μ die at a disease induced death rate δ_{ V }.
Mathematically, the model is a follows
A description of the variables and parameters of the model (1.1) is given in Table 1.
Garba et al. [11] have calculated the basic reproduction number R_{0} for their original model. Although, we have made a slight modification to the original model, the basic reproduction number for our model is not significantly different. Hence, for the model (1.1), R_{0} is given by
where
Representative trajectories of the model (1.1) for different values of R_{0} are given in Figure 10 and Figure 11. As expected, a value of R_{0} greater than unity leads to an epidemic while a value of R_{0} less than unity leads to swift disease elimination.
Data sources
The epidemic data used in this study was collected by the Punjab Disaster Management Authority (PDMA) during the 2011 Dengue Epidemic in Punjab, Pakistan. The data, displayed in Figure 12, represents the cumulative number of dengue fever cases reported over a 32week period extending from the 1 ^{st}of August 2011, to the 20^{th}of February 2012. The data was collected from a number of public and private hospitals in the major metropolitan centers of Punjab, including Lahore, Faisalabad and Multan. Patients were classified as infected with dengue virus based on the results of laboratory tests for the dengue specific antibodies Immunoglobulin G (IgG) and Immunoglobulin M (IgM). As per the Government of Pakistan’s directives, the laboratory test was available at a subsidized rate in all major hospitals in the capital city of Lahore. It is therefore unlikely that poverty played a serious role in underreporting of dengue fever cases.
Prior to August 2011, there were three reported cases of dengue fever, all occurring several months before the actual epidemic. This leads us to conclude that these were isolated incidents and were not directly related to the epidemic itself. Furthermore, nearly 87% of all dengue infections were caused by a single serotype (DEN2) of the dengue virus. This justifies our use of a singlestrain epidemic model as opposed to dengue models that incorporate the effects of crossimmunity and ADE.
Estimation schemes
In order to calculate R_{0}, we require the values of several parameters used in model (1.1). Furthermore, we require knowledge of the initial conditions that will be used to simulate trajectories of the model (1.1).
Estimates for several of the model parameters used in model (1.1) can be obtained from existing studies on Dengue Fever. Table 2 lists these parameters along with reasonable estimates of their values.
The recruitment rate for the susceptible host population depends on the demographics of the urban population that is being considered. This study uses epidemic data collected during the 2011 Dengue Epidemic in Punjab, Pakistan. Therefore, in the absence of concrete estimates, the host recruitment rate has been chosen so as to allow for a realistic steady state host population.
The effective contact rate C_{ HV }, which is a measure of the rate at which contact between an infective and a susceptible individual occurs and the probability that such contact will lead to an infection, is extremely difficult to determine directly. Most previous studies have used assumed values for the effective contact rate [11–13]. Thus, it is not possible to directly estimate the basic reproduction number. Consequently, we will adopt an indirect approach, similar to previous studies such as [30] and [18], by first finding the value of the parameter C_{ HV }for which the model (1.1) has the best agreement with the epidemic data, and then using the resultant parameter values to estimate R_{0}.
As mentioned before, for the purpose of simulating model (1.1) we require knowledge of the initial conditions. It is possible to consider the initial conditions (S_{ H }(0),E_{ H }(0),I_{ H }(0),R_{ H }(0),S_{ V }(0),E_{ V }(0),I_{ V }(0)) as model parameters along with the effective contact rate C_{ H V } and estimate values for all parameters. Such a technique, however, produces slightly unreliable results. This is explained by the fact that the available epidemic data is restricted to the cumulative number of dengue cases reported, while the optimization schemes that we will employ produces estimates for eight variables. There are thus too many degrees of freedom and the ‘bestfit’ may result in unrealistic estimates for the initial conditions. We will therefore use reasonable estimates for the initial conditions and restrict ourselves to optimizing only the effective contact rate C_{ HV }.
Table 3 displays the initial conditions that were therefore chosen.
In the following sections we will employ different methods to estimate the parameter ω = C_{ HV }by minimizing the difference between the predictions of model (1.1) and the epidemic data.
Ordinary Least Squares Estimation
We will first attempt to estimate the effective contact rate by fitting the best trajectory of model (1.1) to the epidemic data using an ordinary least squares (OLS) scheme, implemented using the fminsearch function in the builtin Optimization Toolbox in MATLAB. This will allow us to estimate the parameter ω and thus calculate the basic reproduction number R_{0}.
In order to fit model trajectories with the observed epidemic data, we will assume that all reported cases recovered from the infection after a time lag of two weeks. Thus, the epidemic data, after a lapse of two weeks, represents the total recovered host population. In addition, in order to account for the effect of control measures that were implemented during the actual epidemic, we have assumed that the transmission rate C_{ HV }is a function of time t. We will assume that the transmission rate was constant up until the point when control measures were implemented, whereupon it changed to a different constant value. An alternative definition of the contact rate is also given in [33]. Thus, mathematically,
where t^{∗} is the time at which control measures were first implemented. For the purpose of this study and in view of no concrete information being available in this regard, we have assumed that t^{∗} = 8 weeks.
For the purpose of this section we shall employ the notation and methodology developed in [30]. Essentially, we will employ inverse problem methodology to obtain estimates of ω = C_{ HV }by minimizing the difference between the observed weekly cumulative number of recovered host individuals and the model predictions using a ordinary least squares (OLS) criterion. This will be done in the context of an appropriate statistical model.
The ordinary least squares estimation methodology that we will employ is based on model (1.1) as well as an appropriate statistical model for the observation process. Thus, similar to [30], we assume that the model (1.1), together with a ‘true’ value of the parameter ω, given by ω_{0}, perfectly describes the transmission dynamics of the dengue epidemic. Moreover, we assume that the N observations, Y_{ j }, given by the epidemic data, are generated by a statistical process. However, the N observations also contain random deviations from the underlying statistical process. Hence, following [30], it is assumed that
where p(t_{ j }, ω_{0}) denotes where p(t_{ j }, ω_{0}) denotes the total number of infected individuals who recover in the time span of week j when using the ‘true’ parameter ω_{0}. Thus, it is assumed that the observed epidemic data is a particular realization of the statistical model (2.2). Under the framework of model (1.1) therefore,
where t_{0} denotes the time of start of the statistical process and the time of each observation is given by and ordered as t_{0} < t_{1} < t_{2} … .. < t_{ N }.
The error terms ε_{ j }, are assumed to be independent, identically distributed (i.i.d) random variables, each with zero mean and finite variance. No further assumptions are made regarding the distribution of ε_{ j }. In particular, it is not assumed that the distribution is normal. Thus, E[ ε_{ j }] = 0 and v ar (ε_{ j }) = σ^{2}. Note that, the i.i.d assumption implies that the variance for each error term ε_{ j }is identical. We therefore have
For a set of observations ${\left\{{Y}_{j}\right\}}_{j=1}^{N}$, produced by the statistical model (2.2), we define the statistical estimator ω_{ OLS }as
where $\Omega \subset \mathbb{R}$ is defined as the physically and biologically feasible region for the parameter ω. In other words, the statistical estimator is the solution of the following equations
It is clear that the statistical estimator ω_{ OLS }is a random variable since each error term ε_{ j }is a random variable. Furthermore, the estimator attempts to minimize the distance between the observed weekly cumulative number of recovered host individuals and the predictions of model (1.1). A subsequent detailed description of how to obtain the probability distribution associated with ω_{ OLS }is given in [30]. Our goal in the current study will be to obtain the mean of the probability distribution of ω_{ OLS }.
Uncertainty and sensitivity analysis of R _{ 0 }
As mentioned previously, the basic reproduction number R_{0} for the deterministic model (1.1) is given by (1.2) Thus, the value of R_{0} depends on the variables C_{ HV },Π_{ H },μ,σ_{ H },δ_{ H },τ_{ H },Π_{ V },σ_{ V },μ_{ V }and δ_{ V }. While deterministic models implicitly assume that the model parameters are not stochastic in nature, an element of uncertainty is always associated with estimates of these parameters due to factors such as natural variation, errors in measurements and lack of measuring techniques. In general, uncertainty analysis quantifies the degree of confidence in the parameter estimates by producing 95% confidence intervals (CI) which can be interpreted as intervals containing 95% of future estimates when the same assumptions are made and the only noise source is observation error. Additionally, sensitivity analysis identifies critical model parameters and quantifies the impact of each input parameter on the value of an output. In this section, we shall perform uncertainty and sensitivity analysis of the basic reproduction number R_{0}. A detailed description of the history and methodology of uncertainty and sensitivity analysis is given in [35].
We will use Latin hypercube sampling (LHS) [35] to quantify the uncertainty in and sensitivity of R_{0} as a function of the 7 model parameters (C_{ HV },μ,σ_{ H },δ_{ H },τ_{ H },σ_{ V },μ_{ V }and δ_{ V }). It is assumed, following [33], that the recruitment rates Π_{ V }and Π_{ H }are constants. This will enable us to obtain CI for the value of R_{0} that we have estimated in the last section. For the sensitivity analysis, the partial rank correlation coefficient (PRCC) technique [35] will be used to assess the impact of changes in parameter values on the value of R_{0}. PRCC, which uses rank transformation of the data to reduce the effects of nonlinearity, provides a measure of monotonicity after the removal of the linear effects of all but one model parameter. PRCC is therefore a global sensitivity analysis technique. The assumed distributions of the model parameters used in the two analyses are mentioned in Table 4.
Generalized least squares estimation
The Ordinary Least Squares Estimation (OLS) scheme we employed in the previous section assumed that the variances associated with the epidemic observations were longitudinally constant and not dependent on the values of the observations. This may not be a realistic assumption especially if the epidemic data is influenced by a source of nonconstant systematic error such as underreporting. Indeed, underreporting of dengue cases has been well documented in previous studies such as [36] and [37].
If indeed the epidemic data that is being used in the current study is influenced by underreporting then the assumption of constant variances associated with the observations is not correct since observation errors will now be proportional to the size of the measurement. Hence, we must use a statistical model, which assumes longitudinally nonconstant and model dependent variances for the epidemic observations. In this section therefore, we will attempt to estimate the effective contact rate by fitting the best trajectory of model (1.1) to the epidemic data using a generalized least squares (GLS) scheme. An excellent discussion on the use of the OLS and GLS scheme and different statistical models depending on the assumptions regarding the error present in the observation process is given in [38].
Once again we shall employ the notation and methodology developed in [30]. Apart from the assumptions of the statistical model, as before, we will assume that all reported cases recovered from the infection after a time lag of two weeks and that therefore, the epidemic data, after a lapse of two weeks, represents the total recovered host population. Furthermore, we will again assume that the effective contact rate C_{ HV }is a function of time. Thus, mathematically, Eq. 2.1 where t^{∗} is the time at which control measures were first implemented. As before, we have assumed that t^{∗} = 8 weeks.
Again, following [30], we will employ inverse problem methodology to obtain estimates of ω = C_{ HV }by minimizing the difference between the observed weekly cumulative number of recovered host individuals and the model predictions using a generalized least squares (GLS) criterion. This will be done in the context of a statistical model, which assumes that the error in the observation process is directly proportional to the values of the measurement. Hence, it is assumed that
where denotes where p(t_{ j },ω_{0}) denotes the total number of infected individuals who recover in the time span of week j using the ‘true’ parameter ω_{0}. Thus, it is assumed that the observed epidemic data is a particular realization of the statistical model (4.1). Under the framework of model (1.1) therefore,
where t_{0} denotes the time of start of the statistical process and the time of each observation is given by and ordered as t_{0} < t_{1} < t_{2} ….. < t_{ N }.
The rest of the analysis is similar to the method outlined in the previous section and follows easily.
Methods and materials for statistical inference using the directtransmission model
The direct transmission epidemic model
Several existing studies on the transmission dynamics of dengue use a direct transmission SEIR model [13]. The direct transmission models can be obtained using an approximation to vectorhost models such as model (1.1). First, it is assumed that the vector force of infection can be approximated by solving for the equilibrium values of the vector population compartments. Furthermore, it is assumed that the susceptible vector population is approximately a linear multiple of the total host population. These two assumptions effectively result in a rescaling of the host effective contact rate C_{ HV }of model (1.1) into a direct transmission contact parameter β. Using the aforementioned approximation, we formulate a standard incidence, direct transmission SEIR model. More details of the approximation are given in [13].
Mathematically, the directtransmission model is a follows
where
A description of the variables and parameters of the model (3.1) is given in Table 5 and Tablee 6 respectively.
For the direct transmission model (3.1), the basic reproduction number R_{0} is defined as the total number of secondary infections produced by introducing a single infected host in a completely susceptible population. Therefore, for model (3.1), the basic reproduction number R_{0} is given by:
As discussed previously, the existing literature on dengue fever provides excellent estimates of all the parameters of Model (3.1) with the exception of the contact rate β. Hence, our aim will be to estimate the contact rate β using statistical inference and thereby estimate the basic reproduction number R_{0}. The basic methodology used for both the OLS and GLS schemes will be similar to the process outlined in the previous sections.
In order to account for the effect of control measures that were implemented during the actual epidemic, we have assumed that the contact rate β(t) is a function of time t. We will assume that the contact rate was constant up until the point when control measures were implemented, whereupon it changed to a different constant value. Hence, mathematically,
where t^{∗} is the time at which control measures were first implemented. For the purpose of this study and in view of no concrete information being available in this regard, we have assumed that t^{∗} = 6 weeks. We observe that β(t) is most likely not a continuous function of time t. An alternative definition of the transmission rate β(t), as a continuous function of time, is given in [33].
The stochastic model and Markov Chain Monte Carlo (MCMC)
In this section we formulate a stochastic, directtransmission, discretetime, (S)usceptible, (E)xposed, (I)nfected and (R)ecovered/(R)emoved (SEIR) model for the transmission dynamics of dengue virus. We then use standard Markov chain Monte Carlo (MCMC) methods to perform Bayesian Inference on the epidemic data to obtain estimates of the basic reproduction number R_{0}. As mentioned in [13], a number of studies exist on the transmission dynamics of dengue that assume directtransmission. Moreover, a simple approximation can be used to reduce a vectorhost model for dengue virus to a direct transmission model (see [13] and the references given therein for more details). The purpose of using a directtransmission model is to make stochastic inference computationally tractable. For the purpose of this study, we will broadly follow the procedure outlined in [33].
Stochastic model formulation
The stochastic SEIR model we will consider is a discretetime model that has been formulated and discussed previously in [25, 33]. Let S(t),E(t),I(t) and R(t) denote the susceptible host, exposed host, infected host and removed or recovered host populations at time t respectively. As is common, our model will assume homogenous mixing of all individuals. Furthermore, consider a time interval (t,t+h], where h denotes the length of time between two observations of the epidemic. In this case therefore, h is one week. Now, let B(t) denote the number of susceptible individuals who have contracted disease, C(t) the number of exposed individuals who have become infected and D(t) the number of infected individuals who die or have recovered from the disease during this time interval. For the sake of simplicity, and keeping in view the low mortality rate associated with dengue fever, we will assume that the diseaseinduced death rate is negligible. Following [33], and in view of the fact that the dynamics of dengue disease take place on a much smaller time scale than the average human life expectancy, we will assume that the total population N remains constant. Thus, mathematically the stochastic model is given by
where,
are random variables with binomial distributions. The probability of success for these binomial random variables is given by
Here, $\beta \left(t\right),\frac{1}{\sigma}$ and $\frac{1}{\tau}$ are the time dependent transmission rate, the mean latency period and the mean infectious period respectively. Thus, in model (4.1) the transitions from one compartment to another are formulated as an exponentially distributed stochastic movements. The probability that each individual will stay in a specific compartment for a time period h is given by e xp(Π h), where Π is the compartment specific movement rate. The binomial distributions in (4.2) are then obtained by summing over the individual Bernoulli trials for every individual in the compartment. It is assumed that each trial is independent and identical for every member of the compartment.
Similar to the previous section, we will assume that the contact rate β(t) is a function of time t. Thus, mathematically, Eq. (3.2) where t^{∗} is the time at which control measures were first implemented. As before, we have assumed that t^{∗} = 8 weeks.
The stochastic model (4.1) makes use of the parameters β(t),σ and τ. Of these parameters, estimates of σ and τ can be obtained from existing studies on dengue virus and have been given previously. Therefore, our aim will be to estimate the parameter β(t) using the epidemic data and estimates of initial population sizes. For this purpose, we define the parameter vector, which we will attempt to estimate, as Θ = β(t). Moreover, we define
where τ^{∗} denotes the time at which observations of the epidemic have finished.
Thus, based on the available epidemic data, we have complete knowledge of both C and D but no knowledge of B. This lack of knowledge will be a major cause of uncertainty in our analysis. Nevertheless, we will attempt to estimate R_{0} for both the time period before control measures are implemented and the time period after control measures are implemented using our knowledge of both C and D.
Inference methodology
Based on the definitions given in (4.1) and (4.2), we observe that B(t),C(t) and D(t) are conditionally independent random variables. Thus, the likelihood function for the data set {B,C,D} is given by
where f_{1},f_{2} and f_{3} are the binomial transition probabilities given in (4.1) and (4.2), conditioned on Θ and all the epidemic data represented by B, C and D up until time t. Therefore, the maximum likelihood estimator for the parameter vector Θ, and by extension for β(t) and R_{0} can be obtained by maximizing the expression in (4.4).
According to model (4.1), the time series for S(t),E(t),I(t) and R(t) can be obtained using B(t),C(t) and D(t). Unfortunately as mentioned previously, B(t) is unknown since the process of infection is not observed. Hence, we must also impute the values of B(t). These values can then be used to construct the time series for S(t) and E(t).
Since, the likelihood function for B,C and D, denoted as L(B,C,DΘ), is given by (4.4), we can use Bayes’ Theorem to obtain, up to a constant, the required posterior distribution that we wish to sample from. This is given by
where π(Θ) is the prior distribution. Thus, our MCMC algorithm will sample from the conditional probability distributions π(ΘB,C,D) and π(BΘ,C,D) to produce samples from the required distribution π(Θ,BC,D). In short, our general algorithm will proceed as follows

Initialize the set B using any appropriate initial vector.

Since, C and D are known, construct the time series for S(t),E(t),I(t) and R(t).

Initialize the parameter vector Θ.

Update B using the conditional distribution π(BΘ,C,D).

Reconstruct the new time series for S(t),E(t),I(t) and R(t).

Update Θ using the conditional distribution π(ΘB,C,D).

Repeat steps 46 until the Markov chain has converged and subsequently, the required samples have been obtained.
To sample from π(BΘ,C,D) one can use the conditional binomial distribution for B, making sure that the choice is consistent with the final size and length of the epidemic. This is however computationally very inefficient as most of the draws would be rejected due to the consistency condition. To avoid this issue we condition the proposal on the observed extinction time, following the method described in [33] for computationally efficient sampling. π(ΘB,C,D) is updated using a random walk proposal.
Inference from the observed dengue data
An important question that arises at this point pertains to the meaning and significance of the basic reproduction number R_{0} for the stochastic SEIR model. As mentioned previously and as discussed in detail in [31], the basic reproduction number R_{0} for the deterministic SEIR model is essentially a threshold quantity which determines the possibility of an outbreak of the disease. Thus, for the deterministic SEIR model, if R_{0} is less than unity there is no epidemic while if R_{0} is greater than unity there will be a disease epidemic.
Unfortunately, the threshold dynamics of the stochastic SEIR model are not the same. It can be proven that in contrast to the deterministic model, the stochastic SEIR model predicts disease extinction regardless of the value of R_{0}. This results in difficulty regarding the interpretation of R_{0} as a threshold quantity. Therefore, it is tempting to ask the question: what is the importance of R_{0} in the stochastic SEIR model? An answer to this question may be conjectured (but not proven) by referring to [39]. It is proven in [39] that for the stochastic SI model, on average no epidemic will occur if R_{0} < 1, while for R_{0} > 1 there is a finite probability that an endemic quasiequilibrium will develop. We conjecture that this result also holds true for the stochastic SEIR model and that it can therefore be used to explain the significance of R_{0} as a threshold quantity for the stochastic SEIR model.
Using the MCMC algorithm discussed in the previous section, we estimate the transmission rate β(t) and hence the basic reproduction number R_{0}, both for the time period before control measures are implemented and the time period after the control measures are implemented. We have taken t^{∗} = 8 weeks and the initial population sizes to be the same as in the case of inference from the deterministic model using the GLS and OLS schemes. Furthermore, we have taken an uniform prior distribution for the parameter vector Θ. We observe that the results of the MCMC algorithm, displayed in Table 7, are in close agreement with the corresponding results obtained from application of the OLS and GLS schemes to the directtransmission deterministic model.
References
 1.
Ranjit S, Kissoon N:Dengue hemorrhagic fever and shock syndromes. Pediatr Crit Care Med. 2011, 12: 90100. 10.1097/PCC.0b013e3181e911a7.
 2.
World Health Organization: dengue and severe dengue fact sheet. 2012, [http://www.who.int/mediacentre/factsheets/fs117/en/],
 3.
Gubler DJ:Dengue and dengue hemorrhagic fever. Clin Microbiol Rev. 1998, 11 (3): 480496.
 4.
Halstead S, Nimmannitya S, Cohen S:Observations related to pathogenesis of dengue hemorrhagic fever. IV. Relation of disease severity to antibody response and virus recovered. Yale J Biol Med. 1970, 42 (5): 311322.
 5.
Kautner I, Robinson MJ, Kuhnle U:Dengue virus infection: epidemiology, pathogenesis, clinical presentation, diagnosis, and prevention. J Pediatr. 1997, 131 (4): 516524. 10.1016/S00223476(97)700544.
 6.
Shekhar C:Deadly dengue: new vaccines promise to tackle this escalating global menace. Chem Biol. 2007, 14 (8): 871872. 10.1016/j.chembiol.2007.08.004.
 7.
Holmes EC, Twiddy SS:The origin, emergence and evolutionary genetics of dengue virus. Infect Genet Evol. 2003, 3: 1928. 10.1016/S15671348(03)000042.
 8.
Whitehorn J, Farrar J:Dengue. Br Med Bull. 2010, 95: 161173. 10.1093/bmb/ldq019.
 9.
Gubler D, Kuno G: Dengue and Dengue Hemorrhagic Fever. 1997, London: CAB INTERNATIONAL
 10.
Kawaguchi I, Sasaki A, Boots M:Why are dengue virus serotypes so distantly related? Enhancement and limiting serotype similarity between dengue virus strains. Proc R Soc Lond B Biol Sci. 2003, 270 (1530): 22412247. 10.1098/rspb.2003.2440.
 11.
Garba SM, Gumel AB:Abu Bakar MR: Backward bifurcations in dengue transmission dynamics. Math Biosci. 2008, 215: 1125. 10.1016/j.mbs.2008.05.002.
 12.
Garba S, Gumel A:Effect of crossimmunity on the transmission dynamics of two strains of dengue. Int J Comput Math. 2010, 87 (10): 23612384. 10.1080/00207160802660608.
 13.
Wearing HJ, Rohani P:Ecological and immunological determinants of dengue epidemics. Proc Natl Acad Sci. 2006, 103 (31): 1180211807. 10.1073/pnas.0602960103.
 14.
Esteva L, Vargas C:Coexistence of different serotypes of dengue virus. J Math Biol. 2003, 46: 3147. 10.1007/s0028500201684.
 15.
Ferguson N, Anderson R, Gupta S:The effect of antibodydependent enhancement on the transmission dynamics and persistence of multiplestrain pathogens. Proc Natl Acad Sci. 1999, 96 (2): 790794. 10.1073/pnas.96.2.790.
 16.
Esteva L, Vargas C:A model for dengue disease with variable human population. J Math Biol. 1999, 38 (3): 220240. 10.1007/s002850050147.
 17.
Esteva L, Vargas C:Analysis of a dengue disease transmission model. Math Biosci. 1998, 150 (2): 131151. 10.1016/S00255564(98)100032.
 18.
Chowell G, DiazDueñas P, Miller J, AlcazarVelazco A, Hyman J, Fenimore P, CastilloChavez C:Estimation of the reproduction number of dengue fever from spatial epidemic data. Math Biosci. 2007, 208 (2): 571589. 10.1016/j.mbs.2006.11.011.
 19.
Allen LJ:An introduction to stochastic epidemicmodels. Mathematical Epidemiology, Volume 1945 of Lecture Notes in Mathematics. Edited by: Brauer F, Driessche P, Wu J. 2008, Springer Berlin Heidelberg, 14197 Berlin Germany, 81130.
 20.
Keeling MJ, Ross JV:On methods for studying stochastic disease dynamics. J R Soc Interface. 2008, 5 (19): 171181. 10.1098/rsif.2007.1106.
 21.
Bailey NT:A simple stochastic epidemic. Biometrika. 1950, 37: 193202. 10.1093/biomet/37.34.193.
 22.
Allen LJ, Flores DA, Ratnayake RK, Herbold JR:Discretetime deterministic and stochastic models for the spread of rabies. Appl Math Comput. 2002, 132 (2): 271292.
 23.
Weiss GH, Dishon M:On the asymptotic behavior of the stochastic and deterministic models of an epidemic. Math Biosci. 1971, 11 (3): 261265.
 24.
Tuite AR, Tien J, Eisenberg M, Earn DJ, Ma J, Fisman DN:Cholera epidemic in Haiti, 2010: using a transmission model to explain spatial spread of disease and identify optimal control interventions. Ann Intern Med. 2011, 154 (9): 593601. 10.7326/00034819154920110503000334.
 25.
Allen L, Driessche P:Stochastic epidemic models with a backward bifurcation. Math Biosci Eng. 2006, 3 (3): 445
 26.
de Souza DR, Tomé T, Pinho ST, Barreto FR, de Oliveira MJ:Stochastic dynamics of dengue epidemics. Phys Rev E. 2013, 87: 012709
 27.
Spencer S: Stochastic epidemic models for emerging diseases. PhD thesis. 2008, University of Nottingham
 28.
Allen LJ: An Introduction to Stochastic Processes with Applications to Biology. 2003, New Jersey: Pearson Education
 29.
Allen LJ, Burgin AM:Comparison of deterministic and stochastic SIS and SIR models in discrete time. Math Biosci. 2000, 163: 133. 10.1016/S00255564(99)000474.
 30.
CintrónArias A, CastilloChávez C, Bettencourt LM, Lloyd AL, Banks H:The estimation of the effective reproductive number from disease outbreak data. Math Biosci Eng. 2009, 6 (2): 261282.
 31.
Van den Driessche P, Watmough J:Reproduction numbers and subthreshold endemic equilibria for compartmental models of disease transmission. Math Biosci. 2002, 180: 2948. 10.1016/S00255564(02)001086.
 32.
Chowell G, Hengartner N, CastilloChavez C, Fenimore F, Hyman J:The basic reproductive number of Ebola and effects of public health measures: the cases of Congo and Uganda. J Theor Biol. 2004, 229: 119126. 10.1016/j.jtbi.2004.03.006.
 33.
Lekone PE, Finkenstädt BF:Statistical inference in a stochastic epidemic SEIR model with control intervention: Ebola as a case study. Biometrics. 2006, 62 (4): 11701177. 10.1111/j.15410420.2006.00609.x.
 34.
O’Neill P, Roberts GO:Bayesian inference for partially observed stochastic epidemics. J R Statisitcal Soc A. 1999, 162: 121129. 10.1111/1467985X.00125.
 35.
Sanchez MA, Blower SM:Uncertainty and sensitivity analysis of the basic reproductive rate: tuberculosis as an example. Am J Epidemiol. 1997, 145 (12): 11271137. 10.1093/oxfordjournals.aje.a009076.
 36.
Suaya JA, Shepard DS, Beatty ME:Dengue: burden of disease and costs of illness. TDR. Report of the Scientific Working Group Meeting on Dengue. 2006, Geneva Switzerland: World Health Organization, 3549.
 37.
Beatty ME, Beutels P, Meltzer MI, Shepard DS, Hombach J, Hutubessy R, Dessis D, Coudeville L, Dervaux B, Wichmann O, Margolis HS, Kuritsky JN:Health economics of dengue: a systematic literature review and expert panel’s assessment. Am J Trop Med Hyg. 2011, 84 (3): 473488. 10.4269/ajtmh.2011.100521.
 38.
Banks HT, Davidian M, Jr Samuels JR, Sutton KL: An Inverse Problem Statistical Methodology Summary. 2009, 3994 AK Houten Netherland
 39.
Jacquez JA, O’Neill P:Reproduction numbers and thresholds in stochastic epidemic models I. Homogeneous populations. Math Biosci. 1991, 107 (2): 161186. 10.1016/00255564(91)900032.
Acknowledgements
The authors would like to acknowledge and thank the Punjab Disaster Management Authority for providing the data used for the research work being presented in this article.
Author information
Additional information
Competing interests
The authors declare that they have no competing interests.
Authors’ contributions
MI and AK gave initial input and directed the research. MH implemented the algorithms, carried out the programming and drafted the manuscript. AK interpreted the results and edited the manuscript. All authors read and approved the final manuscript.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
Rights and permissions
About this article
Cite this article
Khan, A., Hassan, M. & Imran, M. Estimating the basic reproduction number for singlestrain dengue fever epidemics. Infect Dis Poverty 3, 12 (2014) doi:10.1186/20499957312
Received
Accepted
Published
DOI
Keywords
 Epidemiology
 Dengue fever
 Statistical inference
 Stochastic model
 Markov chain Monte Carlo