DIMACS Launches Five-year Focus on Computational/Mathematical Epidemiology

March 2, 2002

Anthrax toxin lethal factor. One of the DIMACS working groups will consider ways in which mathematical modeling can help in the defense against anthrax or other pathogens released in bioterrorist attacks.
Fred S. Roberts

Interest in infectious diseases has increased greatly in recent years. The emergence of new diseases---such as Lyme disease, HIV/AIDS, hepatitis C, hanta-virus, and West Nile virus---has been accompanied by the evolution of anti-biotic-resistant strains of the organisms that cause tuberculosis, pneumonia, and gonorrhea. Infectious agents called prions have been discovered as the cause of mad cow disease. Methods of mathematics and computer science have become important tools for analyzing the spread and control of infectious (and noninfectious) diseases. With these developments in mind, DIMACS---the Rutgers-based Center for Discrete Mathematics and Theoretical Computer Science---is conducting an ambitious five-year "special focus": Computational and Mathematical Epidemiology.

The original plan had been to launch the activity in June 2002. Current concern about the deliberate introduction of diseases like anthrax, smallpox, and plague by bioterrorists, however, led us to advance the starting date. The first meeting, of a working group that will explore the role of the mathematical sciences in defense against bioterrorist attacks, is now scheduled for March 22-23, 2002. We have also modified the themes of some of the later activities. Details about all events scheduled for 2002 appear in the final section of this article.

Methods of Computational Epidemiology
Epidemic models of infectious diseases date back to Daniel Bernoulli's mathematical analysis of smallpox in 1760 and have been developed extensively since the early 1900s. The hundreds of mathematical models produced since that time have explored the effects of bacterial, parasitic, and viral pathogens on human populations. The results have highlighted and formalized such concepts as the core population in sexually transmitted diseases and made explicit other concepts, such as herd immunity for vaccination policies. Key pathogens that have been studied include Malaria, Neisseria gonorrheae, M. tuberculosis, HIV, and T. palladum. Mathematical modeling, with the help of computational tools, has provided new insights on such important issues as drug resistance, rate of spread of infection, epidemic trends, and effects of treatment and vaccination. A smaller but growing literature is concerned with mathematical models of noninfectious diseases, such as cancer.

For many diseases, however, we are far from understanding the mechanisms of disease dynamics. The modeling process can provide insight into and help clarify both data and theories. Mathematical models, with the aid of computer simulations, are useful theoretical and experimental tools for building and testing theories about complex biological systems involving disease, for assessing quantitative conjectures, determining sensitivities to changes in parameter values, and estimating key parameters from data. The size of modern epidemiological problems and the large data sets that arise call out for the use of powerful computational methods for studying these large and complex models. New computational methods are needed to deal with the dynamics of multiple interacting strains of viruses, through the construction and simulation of dynamic models; the problems of spatial spread of disease, through pattern analysis and simulation; and the early detection of emerging diseases or bioterrorist acts, through rapidly responding surveillance systems. The development of such new methods is a goal of the special focus at DIMACS.

Statistical methods have long been used in epidemiology to evaluate the role of chance and confounding associations. Epidemiologists seek to ferret out sources of systematic error ("bias and confounding") in observations, evaluate the role of uncontrollable error (using statistical methods) in producing the results, and interpret the results using correlative information from the medical and biological sciences. Because of the increasingly huge data sets involved, the role of statistical methods in epidemiology is changing; new methods and approaches, making use of modern information technology, are needed. The special focus will emphasize such methods and approaches.

A smaller but venerable tradition within epidemiology has considered the spread of infectious disease as a dynamical system to which difference and differential equations are applied. But little systematic effort has been made to apply today's powerful computational methods to these dynamical systems models, and few computer scientists or computational mathematicians have been involved in the process. DIMACS hopes to change this situation.

Probabilistic methods, in particular stochastic processes, have also played an important role in epidemiological modeling. Computational methods for simulating stochastic processes in complex spatial environments or on large networks are now making it possible to simulate more and more complex biological inter-actions. Again, however, few computer scientists or computational mathematicians have been involved in efforts to bring the power of modern computational methods to bear on such problems, and we shall try to remedy this situation.

A variety of other potentially useful approaches to epidemiological issues have not yet attracted the attention of many in the mathematics and computer science community, and some relevant methods of computer science and mathematics are not widely known among epidemiologists. For example, many scientific fields, and in particular molecular biology, have made extensive use of methods of discrete mathematics, as broadly defined, especially those that exploit the power of modern computational tools. These efforts have been guided by algorithms, models, and concepts of theoretical computer science that make these tools more available than they were in the past. Yet these methods remain largely unused in epidemiology.

One major development in epidemiology makes the tools of discrete mathematics and theoretical computer science especially relevant: the use of Geographic Information Systems. These systems allow analytic approaches to spatial information that were not used previously. Another development is the availability of large and disparate computerized databases on subjects related to disease. Modern methods of data mining can clearly be of use here. Data mining methods of cluster analysis, visualization, and learning, grounded in theoretical computer science and statistics, are relevant to spatial/temporal patterning, the recognition that a disease has reached epidemic stage, and the construction of exposure categories.

Discrete mathematics and theoretical computer science are also relevant to the increasing emphasis in epidemiology of an evolutionary point of view. For a full understanding of important issues-such as immune responses of hosts; co- evolution of hosts, parasites, and vectors; drug response; and antibiotic resistance-biologists are increasingly taking approaches that model the impact of mutation, selection, population structure, selective breeding, and genetic drift on the evolution of infectious organisms and their various hosts. Epidemiologists are only beginning to become aware of some of the computer science tools available for analyzing these complex problems; among such tools are new methods of classification and phylogenetic tree reconstruction grounded in concepts and algorithms of theoretical computer science and developed in the course of the explosion in "computational biology."

A great deal more needs to be done in this area. Many phylogenetic techniques were developed for more traditional, well-behaved evolutionary problems. The traditional model of a binary tree with a small number of species as the leaves cannot capture the "quasi-species" nature of many viruses and the very high substitution rates of retroviruses. Collaboration between mathematical scientists and epidemiologists in the development and use of phylogenetic methods in epidemiology is likely to take both fields in new and fruitful directions. Mathematical models that capture the process of one protein interfering with the folding of another seem relevant to such diseases as bovine spongiform encephalopathy (mad cow disease), Alzheimer's disease, Hun-tington's disease, and amyloidosis. These models might shed light on important epidemiological questions involving the crossing of species barriers and dose-response relationships. Models of protein folding, often based on global minimization of energy functions, increasingly make use of methods blending discrete mathematics and theoretical computer science as well.

One of the scheduled DIMACS workshops will explore the epidemiology and evolution of influenza (shown here is influenza virus matrix protein crystal structure at pH 4.0). The rapid evolution of influenza leads to increasing complexity in mathematical and computer models, which as a result often require large computers for implementation and analysis.

Core structure of Gp2 from Ebola virus. Mathematical models can help in the preparation of response plans and intervention strategies for emerging or re-emerging diseases, such as Ebola virus or foot and mouth disease.

Special Focus Events: 2002
Although several events will take place earlier, DIMACS will officially launch the special focus with the International Conference on Computational and Mathematical Epidemiology, June 28 to July 2, 2002. Rita Colwell, director of the National Science Foundation, and Nancy Cox, chief of the Influenza Branch at the Centers for Disease Control, will give keynote addresses. The conference chairs are Simon Levin (Princeton) and Fred Roberts (Rutgers).

During the first two years of the special focus, a series of tutorials will introduce computer and mathematical scientists to relevant epidemiological and biological topics, and epidemiologists and biologists to relevant methods of computer science and mathematics. Two tutorials are scheduled for 2002. The first, "Dynamic Models of Epidemiological Problems," will be held June 24-28, immediately preceding the international conference.

Organized by Carlos Castillo-Chavez (Cornell), Herbert Hethcote (Iowa), and Pauline van den Driessche (University of Victoria), this tutorial will develop mathematical models for the spread of infectious disease by starting with the most basic models and then increasing the complexity to include host-vector situations, multiple groups, age-based groups, spatial spread, differential-delay equations, and functional differential equations. The models presented will address such issues as thresholds, basic reproduction numbers, stability of equilibria, Hopf bifurcation to periodic solutions, multiple endemic equilibria, and chaotic behavior. Applications to specific diseases, such as tuberculosis, influenza, rubella, chicken-pox, whooping cough, and HIV/AIDS, will be included. A goal is to bring working mathematicians up to the research frontiers in mathematical epidemiology.

The other 2002 tutorial, "Epidemiology for Mathematical Scientists," will be held August 26-30, with two components. "Introduction to Epidemiological Studies," organized by David Ozonoff (Boston University), with the help of consultant Robert Horsburgh (also of BU), will introduce mathematicians to the terminology and concepts used by epidemiologists to investigate patterns of disease occurrence in populations. Emphasized will be the types of study designs, measures of association, and types of systematic and random error that concern current epidemiological practice. The second component, "The Foundations of Molecular Genetics for Non-Biologists, organized by William Sofer (Rutgers), will introduce the basics of molecular biology and genetics to those who have been exposed only to classical biology (or who have had no training in biology at all). Topics to be covered include polymers, DNA, RNA, proteins, transcription, translation, DNA replication, genetics, genetic engineering, and the molecular biology of pathogenesis.

The March 22-23 working group meeting mentioned at the beginning of this article is titled "Mathematical Sciences Methods for the Study of Deliberate Releases of Biological Agents and their Consequences." The organizers, Carlos Castillo-Chavez (Cornell) and Fred Roberts (Rutgers), expect the discussions to help identify the challenges posed by bioterrorism, as well as the potential uses of mathematical sciences methods to fight it. The group will consider preventive measures, such as vaccination, vaccine dilution, and antibiotic and vaccine stockpiling, and responsive strategies, such as the isolation (quarantine) of individuals, buildings, populations, and regions. Issues extremely amenable to mathematical modeling arise with each of these measures. Other issues to be addressed include the rapid control of mass transportation systems and the systematic surveillance of food and water supplies.

This group is one of several working groups-new international, interdisciplinary partnerships (computer and mathematical scientists, and biological scientists and epidemiologists) that are central to the DIMACS special focus. The purposely small groups will come together for several meetings at DIMACS. Subgroups will form to investigate specific problem areas and will return to DIMACS for periods of intense research collaboration. Participation in a working group is by invitation only, but expressions of interest from the research community are welcome.

Another working group session scheduled for 2002 is "Adverse Event/Disease Reporting, Surveillance, and Analysis"; organized by Michael Fredman, Donald Hoover, and David Madigan (all of Rutgers), the meeting will be held October 14-18. Disease or event reporting and surveillance systems represent a primary epidemiological data source for the study of/alert to adverse reactions to medication, emerging diseases, or bioterrorist attacks. These systems synthesize data from millions of reports. Challenges for this group include application of computational and statistical methods for the early detection of emerging trends, modification of algorithms of streaming data analysis designed to set off early warning alarms, application of data mining methods, development of causal inferential methods in the absence of controls, study of ways to eliminate bias, and design of verification methodology.

Still another set of issues, arising from the use of natural language in reporting systems, includes the need to devise effective methods for translating natural-language input into formats suitable for statistical analysis; prior work on machine natural-language processing and information retrieval is relevant. Areas for possible subgroups include drug reactions, emerging diseases, and bioterrorism.

Also to meet in 2002 is a working group on the topic "Analogies Between Computer Viruses and Immune Systems and Biological Viruses and Immune Systems"; the session, organized by Lora Billings (Montclair State University), Stephanie Forrest (University of New Mexico), Alun Lloyd (Institute for Advanced Study), and Ira Schwartz (Naval Research Lab), will be held June 10-14. The study of analogies between computer viruses and biological viruses, and associated immune systems, offers promise for the understanding of systems of both types. Early studies of computer viruses spreading through a network were based on models of population dynamics; little theoretical work was done until recently. Recent work supports the view that the methods of theoretical computer science, combined in clever new ways with the traditional tools of population biology, might cast light on both computer and natural viruses. This group will compare the efficacy of discrete and continuous models, explore the addition of time delays and/or stochastic perturbations, and compare the mechanisms of spread at the molecular level. The group will model the spread of a computer virus and assess "vaccination" strategies in a robust computer network.

Workshops, to be held annually during the years of the special focus, will identify areas for research, involve large groups of researchers in the activity, and introduce many people to the field. They will have formal programs, will be widely publicized to the community, and can be expected to spawn new working groups. Some working groups will also have associated one-day workshops. Attendance at tutorials and workshops will be open to all.

A workshop titled "Pathogenesis of Infectious Disease: Host-Pathogen Dynamics" will be held September 23-27, 2002; the organizers are Denise Kirschner (University of Michigan) and Alan Perelson (Los Alamos). Components of host-pathogen systems are so numerous and their interactions so complex that intuition alone is insufficient for a full understanding of the dynamics of the interactions; mathematical modeling becomes an important experimental tool. Recently, models of host interactions with microbes have begun to appear, including models that explore interactions at the bacteria-host level. These models have studied, for example, antimicrobial chemotherapy, urinary tract bacterial infections, mycoparasite-immune dynamics, and tuberculosis. Models of persistent viral infections, namely HIV-host models, also have a successful recent history. Many of the key results that have shaped our recent understanding of the T-cell and viral dynamics in HIV infection have come with the help of mathematical modeling. Along with these developments, the workshop will explore a new line of research that aims to link information obtained at the host level to predict both prevalence and incidence of disease at the population level. Although the special focus will concentrate on infectious diseases, there will be some emphasis on modeling of noninfectious diseases, such as cancer; plans to date include a working group on "Computational Biology of Tumor Progression," organized by Martin Nowak (Institute for Advanced Study), tentatively set to meet for the first time in 2002 or 2003.

Participation in the Special Focus
DIMACS hopes to have funds available for short- and long-term visitors during the special focus. Graduate students and recent PhDs are especially encouraged to participate in the visiting program. We also anticipate modest funds for supporting students and new researchers who wish to attend workshops and tutorials.

The Organizing Committee for the special focus is a microcosm of the types of relationships DIMACS hopes to stimulate through this activity. It includes computer scientists and mathematicians---Martin Farach-Colton, Michael Fredman, S. Muthukrishnan, and Fred Roberts (chair), all of Rutgers---and two statisticians---Donald Hoover and David Madigan (both of Rutgers). Epidemiology and public health are represented by David Ozonoff (Boston University), Burton Singer (Princeton), and Daniel Wartenberg (University of Medicine and Dentistry of NJ). Sunetra Gupta (Oxford), David Krakauer (Institute for Advanced Study), and Simon Levin (Princeton) are biologists.

Renew SIAM · Contact Us · Site Map · Join SIAM · My Account
Facebook Twitter Youtube