A Sparse Representation of Social Media, Internet Query, and Surveillance Data to Forecast Dengue Case Number using Hybrid Decomposition- Bidirectional LSTM
Authors: Wiwik Anggraeni, Eko Mulyanto Yuniarno, Reza Fuad Rachmadi, Pujiadi, Mauridhi Hery Purnomo
Number of views: 102
Dengue fever is an endemic disease that occurs throughout the year. Forecasting cases of dengue fever based on actual data is needed for monitoring and taking action. Recently, developing countries have faced problems related to the dengue fever surveillance system caused by the data delay factor. On the other hand, availability and access to health-related information on the internet have changed people’s behaviors and habits. However, the effect of internet data usage has not been widely studied, especially in areas with different levels of internet penetration. This study examines the impact of dengue fever case reported data, Google Trends, Twitter, and climate data in areas with many cases and varying levels of internet penetration to forecast dengue fever cases. Split time-series cross-validation (STSCV) and blocked time-series cross-validation (BTSCV) are used to obtain various training and testing results. The hybrid Decomposition-Bidirectional Long Short-Term Memory(D-BiLSTM) method is proposed. D-BiLSTM applied to eight different scenarios across multiple level areas. According to the results of the experiments, the D-BiLSTM model with STSCV outperforms the BTSCV. In the high internet penetration area, the average error is 9,517, while in the low internet penetration area, it is 5,188. In areas with high internet penetration, adding the variables Google Trends and Twitter does not significantly reduce the error forecasting. However, in the low penetration area, the inclusion of Google Trends and Twitter significantly decreases errors. In general, the D-BiLSTM model performed well. Then, when compared with other approaches, the D-BiLSTM model as a whole can reduce the average RMSE and the average MAE of the comparison model by 94,120 and 45,132, respectively, in areas of high internet penetration with the best SMAPE model of 0.310. In the low internet penetration area, the average decline in RMSE and MAE was 54,390 and 19,362, with the best SMAPE model performance of 0.183.