Inside Life Science
A New Use for Census Data: Disease Simulations
By Stephanie Dutchen
Posted May 18, 2011
Did you know that when you filled out your census form, you helped computer scientists model how diseases spread in the United States?
Over the last four years, National Institutes of Health-supported researchers at RTI International in North Carolina have been transforming anonymized data from the 2000 Census—which described the country's 281 million people and 116 million households—into a virtual U.S. population. They finished the "synthetic population" in 2009, and they will be updating it as the 2010 Census results come out.
The scientists developed the synthetic population as part of NIH's Models of Infectious Disease Agent Study (MIDAS), a network of researchers who use computers to model infectious diseases with the goal of improving public health. By integrating the population into their computer models, MIDAS researchers can better simulate the spread of an infectious outbreak through a community and examine the best ways to intervene.
The synthetic population doesn't exactly reproduce your hometown in silico, but it comes pretty close. The Census protects citizens' privacy, and the RTI researchers don't—in fact, can't—duplicate John Smith from Manhattan or Jane Doe from Iowa City. Nor do they take each neighborhood home, apartment building, college dorm, family farm and sprawling ranch and plop it down at their exact addresses.
But the Census data did give them the population, household sizes, family incomes and residents' ages and ethnicities for every town, county and state. Plugging all this information into their computers allowed the researchers to create a mirror-country that has the same overall demographics as our actual one.
"The synthetic population looks statistically exactly like the real population," says NIH's Irene Eckstrand, who directs the MIDAS program. "It has all the characteristics of real communities but doesn't invade anyone's privacy."
The number and types of houses in your county match those in the corresponding synthetic county. And each home is on an appropriate patch of land, not in a lake or the middle of an airport. By incorporating geospatial data about such features as road locations, ground slope and land cover, the researchers further refined where virtual residents reside. This helps modelers more realistically simulate proximity to neighbors, health care facilities and insect-containing wooded areas—all of which can influence disease spread.
Because farm animals are also potential disease carriers, the researchers have used a similar approach to create synthetic poultry and pig populations.
Translating to the Real World
Disease modelers can manipulate all or selected parts of the new, ready-made synthetic population. They can model the entire country or just one town.
They can program the virtual citizens—or agents, as modelers call them—to behave in certain ways. For instance, in an outbreak simulation, one agent may get vaccinated while another refuses.
Having synthetic populations at the ready can help speed up disease-spread simulation and allow modelers and policymakers to study real outbreaks as they happen.
The synthetic population will also help modelers study the impact of social networks on disease spread. Researchers can track where agents work or go to school, who they live with and who they're likely to meet running errands. Since people get sick when they come into contact with others who've been infected, studying these social patterns in models should be helpful in understanding them in the real world.
Next, the researchers want to create international synthetic populations. They've already finished one for the 110 million people in Mexico, and they're currently working on another one for India. Multi-country models would allow researchers to better simulate the spread of diseases across national borders.