Predicting the Next Major Outbreak

When medicine and technology appear together, images of robots performing surgery, new drug designs, and 3D-printing organs are evoked. Not only is technology revolutionizing how medicine is practiced, it is also changing the way patients are being identified. In the fight against Ebola and other endemic diseases, computer algorithms are leading the way in tracking the progress of disease and identifying areas where individuals are more at risk. Technology may be able to predict the next outbreak of a disease before it occurs.

In the fight against Ebola and other endemic diseases, computer algorithms are leading the way in tracking the progress of disease and identifying areas where individuals are more at risk.

Barbara Han, a research scientist at the Cary Institute of Ecosystem Studies, employs computer modeling and machine learning to make predictions about species that can cause outbreaks. Similar to the Ebola virus, there are diseases (like malaria) that can be transmitted from animals to humans; these are known as zoonotic diseases. Han first models species as reservoirs — the model includes diseases that can be carried by a particular species, the proportion of a population that is known to carry the disease, relative contact with humans, etc. In using previous outbreak data, Han employs algorithms to sort through the models and calculate the probability of a species being a disease reservoir for humans.


The first classification tree created by a computer was not particularly accurate.

But instead of just creating one model, Han sorts the data thousands of times; each time, the result becomes more accurate given the previous data. Computers are “trained” to more effectively classify species so they can be used for future purposes. After applying one round of sorting algorithms, the machine evaluates its own classifications with previous empirical data and “adjusts” certain parameters before sorting again. By sorting hundreds or maybe even thousands of times, the machine adapts until it reaches a predetermined level of precision. The resulting classification can then be used to determine which geographic regions are more at risk than others; precautions can then be taken ahead of time, standing in stark contrast to the scrambling of aid that occurred just a year ago with Ebola.

However, several weaknesses have been identified in computer modeling. For one, the parameters and data sets are weighted quite arbitrarily. Depending on which data is denoted the referential set, as well as the original priority in which the parameters were selected, different models and learning can yield very different predictions. Furthermore, the question of which parameters to consider is arbitrary at best. For example, in the past year, the Defense Advanced Research Projects Agency challenged computer modelers to investigate the spread of chikungunya, a viral disease spread by mosquitoes. Among the factors considered were geography, climate, and migration patterns of the tiger mosquito (the primary carrier). Which of these features should be given priority over others? What about human movement and transportation? What about socioeconomic status? As more social determinants are regarded, the limited data sets and relevance of some parameters call into question their usefulness.


The third classification tree created by a computer where accuracy has been increased considerably.

Nonetheless, computer models have already elucidated the spread of some diseases. In Pakistan, a group of international researchers collaborated with Telenor, a Norwegian mobile provider, to analyze call records from the last seven months of 2013. Using data on mobile user movement between base transmitter stations (cell phone towers), the researchers created a mathematical model of the travel patterns of over 40 million people. Combining this with models of climate and the contagiousness of dengue fever, they were able to accurately predict the geographic spread and timing of dengue outbreaks in 2013. Although the mathematical model proved to be an accurate predictor, how effective will this model be in the future as climate, travel patterns, and other relevant factors change over time? A more effective model would require a machine learning process similar to the one Han has applied to her sorting algorithms; rather than using a referential data set, the machine would adjust weighted parameters and internal prediction patterns as new data sets become available.

Technology has already opened the doors to alternative forms of medical treatment and extended the boundaries on what can be achieved. Perhaps the potential to predict and prevent future outbreaks will soon be realized as well.

About The Author

  • Munesh Chauhan

    A very nice and informative writeup.

    I am from a parallel computing background and would like to know which are some commonly used prediction algorithms (those use machine learning) for disease outbreak?