- Research article
- Open Access
- Open Peer Review
Forecasting incidence of hemorrhagic fever with renal syndrome in China using ARIMA model
BMC Infectious Diseasesvolume 11, Article number: 218 (2011)
China is a country that is most seriously affected by hemorrhagic fever with renal syndrome (HFRS) with 90% of HFRS cases reported globally. At present, HFRS is getting worse with increasing cases and natural foci in China. Therefore, there is an urgent need for monitoring and predicting HFRS incidence to make the control of HFRS more effective. In this study, we applied a stochastic autoregressive integrated moving average (ARIMA) model with the objective of monitoring and short-term forecasting HFRS incidence in China.
Chinese HFRS data from 1975 to 2008 were used to fit ARIMA model. Akaike Information Criterion (AIC) and Ljung-Box test were used to evaluate the constructed models. Subsequently, the fitted ARIMA model was applied to obtain the fitted HFRS incidence from 1978 to 2008 and contrast with corresponding observed values. To assess the validity of the proposed model, the mean absolute percentage error (MAPE) between the observed and fitted HFRS incidence (1978-2008) was calculated. Finally, the fitted ARIMA model was used to forecast the incidence of HFRS of the years 2009 to 2011. All analyses were performed using SAS9.1 with a significant level of p < 0.05.
The goodness-of-fit test of the optimum ARIMA (0,3,1) model showed non-significant autocorrelations in the residuals of the model (Ljung-Box Q statistic = 5.95,P = 0.3113). The fitted values made by ARIMA (0,3,1) model for years 1978-2008 closely followed the observed values for the same years, with a mean absolute percentage error (MAPE) of 12.20%. The forecast values from 2009 to 2011 were 0.69, 0.86, and 1.21per 100,000 population, respectively.
ARIMA models applied to historical HFRS incidence data are an important tool for HFRS surveillance in China. This study shows that accurate forecasting of the HFRS incidence is possible using an ARIMA model. If predicted values from this study are accurate, China can expect a rise in HFRS incidence.
Hemorrhagic fever with renal syndrome (HFRS), or epidemic hemorrhagic fever (EHF) is an acute viral syndrome caused by infection with one of hantaviruses. HFRS is an important infectious disease in developing countries. In China, HFRS is caused mainly by 2 types of hantaviruses, Hantaan virus (HTNV) and Seoul virus (SEOV), each of which has coevolved with a distinct rodent host. HTNV is associated with Apodemus agrarius, whereas SEOV, which causes a less severe form of HFRS, is associated with Rattus norvegicus . In hantavirus -endemic areas, HFRS is most common among farmers and others who may have close contact with excreta of infected rodents [2, 3]. In mainland China, HFRS remains a serious public health problem with approximately 20,000-50,000 human cases reported annually, approximately 90% of the total cases worldwide [4–6]. Currently, HFRS is endemic in 28 of 31 provinces in mainland China [4, 7].
In response to the spread of HFRS in China, the Chinese Center for Disease Control and Prevention designed a surveillance system for HFRS and created educational programs for the general public. However, the impact of control efforts remains difficult to measure due to the inherent complexities of HFRS as a disease: multiple viral strains with identified genetic polymorphisms, complex disease manifestation, diverse animal reservoirs, and multiple routes of transmission . Infectious diseases have certain characteristic features that lead themselves to modeling, such as: speed of pathogen variation, accumulation of susceptible hosts, and environmental indices . Thus, epidemic modeling and forecasting can be essential tools to prevent and control HFRS. Recently, statistical methods including linear regression [10–12], correlation coefficients , grey swing model , back propagation artificial neural network model  have been used for prediction of HFRS incidence. The variation of HFRS incidence, which is influenced and constrained by diversified factors, is characterized by tendency and randomicity. These statistical tools are inappropriate for analyzing the randomicity of HFRS. Autoregressive integrated moving average (ARIMA) models, which take into account changing trends, periodic changes, and random disturbances in time series, are very useful in modeling the temporal dependence structure of a time series. In epidemiology, ARIMA models have been successfully applied to predict the incidence of infectious diseases, such as influenza mortality , malaria incidence , as well as other infectious diseases [18, 19]. This study aimed to develop a univariate time series model for the HFRS incidence; specifically, a stochastic ARIMA model, for short-term forecasting of HFRS incidence (per 100,000 population) in China.
Chinese HFRS incidence data from 1975 to 2008 was obtained from the Chinese Center for Disease Control and Prevention. All HFRS cases were initially diagnosed by clinical symptoms. Patient blood samples were also collected and sent to local Centers for Disease Control and Prevention (CDC) laboratories for serological confirmation. Finally, data were collected by case number according to the sampling results. There might be admission rate bias in the disease report, but this has been reduced as much as possible. In China, HFRS is a nationally notifiable disease and hospital physicians must report every case of HFRS to the local health authority within 12 hours. Local health authorities later report monthly HFRS case totals to higher the national level CDC for surveillance purposes. Due to mandatory reporting, it is believed that the degree of compliance in disease notification over the study period was consistent.
We used the Box-Jenkins approach to ARIMA (p, d, q) modeling of time series . This model-building process is designed to take advantage of associations in the sequentially lagged relationships that usually exist in periodically collected data . The following were the parameters selected when fitting the ARIMA model: p, the order of autoregression; d, the degree of difference; q, the order of moving average.
The annual data used in this study did not show seasonal pattern, so the series was differenced at the non-seasonal level to induce stationarity. Autocorrelation function (ACF) graph and Partial autocorrelation function (PACF) graph were used to identify the order of moving average (MA) and autoregressive (AR) terms included in the ARIMA model. Estimates of the model's parameters were obtained by the conditional least squares method. Diagnostic checking including residual analysis and the Akaike Information Criterion (AIC) was used to compare the goodness-of-fit among ARIMA models. The Ljung-Box test was used to measure the ACF of the residuals. In addition, we used the mean absolute percentage error (MAPE) and fitting effect diagram to assess forecast accuracy.
, where x t and denote observed and fitted values at time point t. The MAPE value was calculated based on observed values and fitted values from 1978 to 2008. A lower MAPE value indicates a better fit of the data. Finally, the fitted ARIMA model was used for short-term forecasting of HFRS incidence for years 2009 to 2011. All analyses were performed using SAS9.1 with a significant level of p < 0.05.
The present study was reviewed by the research institutional review board of Shandong University and the China CDC, and found that utilization of disease surveillance and meteorological data did not require oversight by an ethics committee.
From 1975 to 1986, the HFRS incidence in China rose regularly with a peak in 1986 of 11.06 cases per one hundred thousand population. After 1986, the incidence descended sharply with a dramatic fluctuation until 2008 (Figure 1). The lowest incidence could be seen in 2008, 0.68 per one hundred thousand.
According Figure 1, the series showed a non-stationary mean, so we stabilized the mean of HFRS incidence by taking both second and third order differences. All further statistical procedures were performed on the transformed HFRS incidence. Based on the distribution characteristics, we conducted five models, ARIMA(0, 2, 1), ARIMA(1,2,1), ARIMA(0,3,1), ARIMA(1, 3, 1), and ARIMA(2, 3, 1). Of all the models tested, the ARIMA(0,3,1) model was the best fit for the data (Table 1). The transformation series by taking third-order differences is shown in Figure 2. The plots of ACF and PACF (Figure 3) described the temporal dependence structure in HFRS incidence. The slow decay in the PACF, associated with a ACF cutoff at lag 1 suggested a MA(q = 1).
The parameter estimates for the optimum ARIMA(0,3,1) model are shown in Table 2. The model's fitted (1975-2008)and predicted values (2009-2011) are presented in Figure 4. The MAPE value was 12.20%. The forecast values of the years 2009, 2010, and 2011 were 0.69, 0.86, and 1.21 per 100,000 population, respectively.
Time series analysis of surveillance data on incidence of various infections is very helpful in developing hypotheses to explain and anticipate the dynamics of the observed phenomena and subsequently in the establishment of a quality control system and reallocation of resources . ARIMA model is one of the most widely used time-series forecasting techniques because of its structured modeling basis and acceptable forecasting performance . In this paper, we applied an ARIMA(p, d, q) model to analyze the surveillance data of HFRS in China. Disease monitoring by public health department entails ongoing data collecting, processing, and updating. However, the national level China CDC is the appropriate level of organization for the implementation of an ARIMA predictive model, because reported data is continually received and updated. We found that model predictions are further improved by the assured availability of the Health Department data. In this study, we have obtained an ARIMA model that closely fits HFRS incidence in China. The autoregression and moving average parameters of our model imply the incidence of HFRS in a month can be estimated by the residual occurring one month prior. According to the results above, the conducted model is reliable with a high validity. Once a satisfactory model has been obtained, it can be used to forecast expected numbers of cases for a given number of future time intervals . Thus, the fitted ARIMA(0,3,1) model can be used to predict the next three years' HFRS incidence in China. The forecast results suggest that the HFRS incidence in China will experience a slight growth in the next three years (2009-2011). A rise in the number of HFRS incidence may also result from an increase in the number and size of natural foci , climate change, especially the increase of mean temperature [26, 27]. Therefore, knowledge of HFRS forecasts is necessary to prompt health departments to strengthen surveillance systems and reallocate resources in anticipation of increasing HFRS incidence.
Several studies have used ARIMA model to fit and predict changing trends in infectious disease. Luz et al applied an ARIMA(2,0,0)×(1,0,0)12 model to predict dengue incidence in Rio de Janeiro  and found that ARIMA models were useful tools for monitoring dengue incidence. Earnest et al indicated that ARIMA models provided useful tools for administrators and clinicians in planning for real-time bed capacity during infectious diseases outbreaks such as SARS . Li et al have applied an ARIMA model to monthly incidence of HFRS in Linyi City, China to predict HFRS incidence, and found that the ARIMA model could be used to predict HFRS incidence with high predictive precision in the short-term . In the present study, we further confirmed the consensus that ARIMA model is a useful tool in monitoring and predicting changing trends in infectious diseases.
To the best of our knowledge, this is the first study to apply ARIMA model to fit the HFRS incidence in China with as many as 34 observations at year level. Some previous studies [30, 31] in China also used ARIMA model to fit and forecast HFRS incidence of some regions, but they had the same problem that the number of observations was not enough, which led to the instability of their forecast results. In order to conduct a stable and effective ARIMA model, we have to collect at least 30 observations . Thus, parameter estimates of the fitted model would be more robust. The longer the series, the better; however, the series should not extend so far into the past as to include periods during which a different case definition was applied or in which any other reporting artifact resulted in a mean number of cases per interval that differs from the mean of recent intervals. As mentioned above, for adequate ARIMA modeling, a time series should be stationary with respect to mean and variance. If the mean increases or decreases over time, or if the variance does, the series may need to be transformed to make it stationary, before being modeled. Otherwise, the prediction effect of the model will be poor.
In order to improve the model, updating the forecasts is very important. A model without seasonal terms will need to be updated frequently. Confidence intervals that widen rapidly as time increase from the starting point of the forecasts also indicate a model that needs frequent updating. Generally speaking, there are two ways to implement the updating. The model can be reapplied to the original series with extra observations added at the end to give forecasts based on a later starting point. Alternatively, a new model can be fitted to the longer series. This is probably preferable, since fitting a model is quick, especially when the old model is used as a guide, and it makes better use of the additional observations.
Some limitations of this study also need to be taken into account when interpreting the results. In this study, the interval of HFRS incidence is one year, so we could not analyze its seasonal characteristic. In further study, we would use monthly data to predict HFRS incidence in order to get seasonal pattern and higher predictive precision. In addition, the data are from a passive surveillance system, the possible biases in disease reporting and potential underreporting of HFRS cases might influence the precision of our analysis.
There is an urgent need for monitoring and predicting HFRS incidence to reduce the substantial morbidity and mortality caused by this disease . ARIMA models applied to historical HFRS incidence data are an important tool for HFRS surveillance. Accurate forecasting of the incidence of HFRS is possible. Our modeling approach can be used to monitor and predict HFRS incidence in China. The ARIMA model could be used to optimize HFRS prevention by providing estimates on HFRS incidence trends in China.
Fang LQ, Zhao WJ, Vlas SJ, Zhang WY, Liang S, Looman CWN, Yan L, Wang LP, Ma JQ, Feng D, Yang H, Cao WC: Spatiotemporal dynamics of hemorrhagic fever with renal syndrome, Beijing, People's Republic of China. Emerging Infectious Diseases. 2009, 15: 2043-2045.
Vapalahti K, Paunio M, Brummer-Korvenkontio M, Vaheri A, Vapaahti O: Puumala virus infections in Finland:increased occupational risk for farmers. Am J Epidemiol. 1999, 149 (12): 1142-1151.
Glass GE, Childs JE, Korch GW, LeDuc JW: Association of intraspecific wounding with hantaviral infection in wild rats(Rattus norvegicus). Epidemiol Infect. 1988, 101: 459-472. 10.1017/S0950268800054418.
Zhang YZ, Xiao DL, Wang Y, Wang HX, Sun L, Tao XX, Qu YG: The epidemic characteristics and preventive measures of hemorrhagic fever with renal syndrome in China. Chin J Epidemiol. 2004, 25 (6): 466-469.
Ulrich R, Hjelle B, Pitra C, Kuger DH: Emerging viruses: the case 'hantavirus'. Intervirology. 2002, 45 (4): 318-322. 10.1159/000067924.
Fang LQ, Yan L, Liang S, Vlas SJ, Feng D, Han X, Zhao W, Xu B, Bian L, Yang H, Gong P, Richardus JH, Cao WC: Spatial analysis of hemorrhagic fever with renal syndrome in China. BMC Infect Dis. 2006, 6: 77-10.1186/1471-2334-6-77.
Yan L, Fang LQ, Huang HG, Zhang LQ, Feng D, Zhao WJ, Zhang WY, Li XW, Cao WC: Landscape elements and Hantaan virus-related hemorrhagic fever with renal syndrome, People's Republic of China. Emerg Infect Dis. 2007, 13 (9): 1301-1306.
Zhang Y: The epidemiological research status and problems and prospects of hemorrhagic fever with renal syndrome in China. Chin J Vector Bio & Control. 2002, 13 (2): 85-88.
Guan P, Huang DS, Zhou BS: Forecasting model for the incidence of hepatitis A based on artificial neural network. World J Gastroenterol. 2004, 10 (24): 3579-3582.
Wang YJ, Zhao TQ, Wang P, Li SQ, Huang Z, Yang GQ, Li XY, Liu B: Applying linear regression statistical method to predict the epidemic of hemorrhagic fever with renal syndrome. Chin J Vector Bio & Control. 2006, 17 (4): 333-334.
Olsson GE, Hjertqvist M, Lundkvist A, Hornfeldt B: Predicting high risk for human hantavirus infections, Sweden. Emerg Infect Dis. 2009, 15 (1): 104-106. 10.3201/eid1501.080502.
Bi P, Wu XK, Zhang FZ, Parton KA, Tong SL: Seasonal rainfall variability, the incidence of hemorrhagic fever with renal syndrome, and prediction of the disease in low-lying areas of China. Am J Epidemiol. 1998, 148 (3): 276-281.
Clement J, Vercauteren J, Verstraeten WW, Ducoffre G, Barrios JM, Vandamme AM, Maes P, Ranst MV: Relating increasing hantavirus incidences to the changing climate: the mast connection. International J of Health Geographics. 2009, 8: 1-10.1186/1476-072X-8-1.
Guo LC, Wu W, Guo JQ, Wang P, Zhou BS: Appling grey swing model to predict the incidence trend of hemorrhagic fever with renal syndrome in Shenyang. Journal of China Medical University. 2008, 37 (6): 839-842.
Wu ZM, Wu W, Wang P, Zhou BS: Prediction for incidence of hemorrhagic fever with renal syndrome with back propagation artificial neural network model. Chin J Vector Bio & Control. 2006, 17 (3): 223-226.
Reichert TA, Simonsen L, Sharma A, Pardo SA, Fedson DS, Miller MA: Influenza and the winter increase in mortality in the United States, 1959-1999. Am J Epidemiol. 2004, 160 (5): 492-502. 10.1093/aje/kwh227.
Gaudart J, Toure O, Dessay N, Dicko AL, Ranque S, Forest L, Demongeot J, Doumbo OK: Modelling malaria incidence with environmental dependency in a locality of Sudanese savannah area, Mali. Malaria Journal. 2009, 8: 61-10.1186/1475-2875-8-61.
Luz PM, Mendes BV, Codeco CT, Struchiner CJ, Galvani AP: Time series analysis of dengue incidence in Rio de Janeiro, Brazil. Am J Trop Med Hyg. 2008, 79 (6): 933-939.
Yi J, Du CT, Wang RH, Liu L: Applications of multiple seasonal autoregressive integrated moving average(ARIMA) model on predictive incidence of tuberculosis. Chinese Journal of Preventive Medicine. 2007, 41 (2): 118-121.
Box GEP, Jenkins GM: Time series analysis: forecasting and control. 1976, San Francisco: Holden Day, 181-218.
Akhtar S, Rozi S: An autoregressive integrated moving average model for short-term prediction of hepatitis C virus seropositivity among male volunteer blood donors in Karachi, Pakistan. World J Gastroenterol. 2009, 15 (13): 1607-1612. 10.3748/wjg.15.1607.
Kuhn L, Davidson LL, Durkin MS: Use of poisson regression and time series analysis for detecting changes over time in rates of child injury following a prevention program. Am J Epidemiol. 1994, 140 (10): 943-955.
Wong J, Chan A, Chiang YH: Time series forecasts of the construction labour market in Hong Kong: the Box-Jenkins approach. Construction Management and Economics. 2005, 23 (9): 979-991. 10.1080/01446190500204911.
Allard R: Use of time-series analysis in infectious disease surveillance. Bulletin of the World Health Organization. 1998, 76 (4): 327-333.
Wang XF, Wang MW, Sun H: Epidemiological analysis of hemorrhagic fever with renal syndrome in China from 2004 to 2005. Disease Surveillance. 2007, 22 (5): 307-309.
Bi P, Tong SL, Donald K, Parton K, Ni JF: Climatic, reservoir and occupational variables and the transmission of hemorrhagic fever with renal syndrome in China. Int J of Epidemiol. 2002, 31: 189-193. 10.1093/ije/31.1.189.
Clement J, Vercauteren J, Verstraeten WW, Ducoffre G, Barrios JM, Vandamme AM, Maes P, Ranst MV: Relating increasing hantavirus incidences to the changing climate: the mast connection. International Journal of Health Geographics. 2009, 8: 1-10.1186/1476-072X-8-1.
Earnest A, Chen MI, Ng D, Leo YS: Using autoregressive integrated moving average(ARIMA) models to predict and monitor the number of beds occupied during a SARS outbreak in a tertiary hospital in Singapore. BMC Health Services Research. 2005, 5: 36-10.1186/1472-6963-5-36.
Li XJ, Kang DM, Cao J, Wang JZ: A time series model in incidence forecasting of hemorrhagic fever with renal syndrome. Journal of Shandong University (Health Sciences). 2008, 46 (5): 547-549.
Wu W, Guan P, Guo JQ, Zhou BS: Comparison of GM(1,1) gray model and ARIMA model in forecasting the incidence of hemorrhagic fever with renal syndrome. Journal of China Medical University. 2008, 37 (1): 52-55.
Chen Y, Bai S, Chen HZ, Sun BJ, Wei WJ, Huang M, Wang P: Fitting research on ARMA model in the prediction of incidence trend of hemorrhagic fever with renal syndrome. Modern Preventive Medicine. 2008, 35 (8): 1414-1415.
Gao HX compilation: SAS System·SAS/ETS Software Manual. Beijing: China Statistics Press, 83-
Chen HX, Luo CW: Surveillance of hemorrhagic fever with renal syndrome in China. Chin J Epedimiol. 2002, 23 (1): 63-66.
The pre-publication history for this paper can be accessed here:http://www.biomedcentral.com/1471-2334/11/218/prepub
This study was supported by the social welfare research special program of the Ministry of Science and Technology, China (2003DIA6N009) and Special Infectious Diseases Program of the Ministry of Science & Technology, China (Grant No.2008ZX10004-010).
The authors declare that they have no competing interests.
QL, XL, BJ and WY conceived the study, undertook statistical analysis and drafted the manuscript. XL and BJ assisted with data collection and statistical analysis. All authors contributed to the writing of the manuscript and approved the submitted version of the manuscript.
Qiyong Liu, Xiaodong Liu, Baofa Jiang contributed equally to this work.