Web Exclusives: Careers
What Is an Ontology?
Biomedical researchers face a growing problem in trying to manage their knowledge. As scientists in different disciplines—even just in different labs—conduct experiments and exchange information, they gather different kinds of data and interpret terms in different ways, sometimes without realizing it.
Imagine you're a biologist working on, say, brain function in chickens. Before you start your first experiment, you want to find out what research has been done on chicken brains.
Sounds easy enough, right? First you search the scientific literature—all the journal articles that have been published in your area of interest. Then you tackle the databases, the huge online repositories that store everything scientists know about chicken brains.
Okay, so you've found hundreds of articles and gigabytes of data. But even when you pare down the results to the most relevant information, you may not be able to interpret or compare them. Suppose your database search pulls up charts from two studies with columns labeled "beak length." Well, what did the researchers mean by that? The numbers could be averaged, in millimeters, in centimeters, about chicks or roosters, anything. If you don't know what the numbers represent, the data is meaningless to you.
And if that weren't enough, what you call a chicken may not be what another researcher calls a chicken. This is a rampant problem in gene research, where different scientists call the same DNA segments by different names or use the same names to refer to different segments. If you don't realize that Dr. Smith's data on what he calls a chicken is actually about what you would call an elephant, "you can come up with some really interesting but bogus conclusions," says Karin Remington, who directs the NIGMS Center for Bioinformatics and Computational Biology.
Computer scientists want to make it much easier for biologists to understand what they're really looking at and have an easier time sharing what they know. They're building virtual libraries called ontologies that organize biological knowledge using a universal language.
By establishing a set of official terms, ontologies allow biologists across labs, specialties and countries to share a common vocabulary. Ontology Web Language, or OWL, is a popular choice. It gives every protein, every gene, every biological process, a standard name. Everyone will call that beaky, feathered creature that goes bok bok a "chicken," and the term won't be used to describe anything else. You can not only consistently label text such as journal articles, you can also tag data tables, medical scans, segments of DNA and other objects. Now you can be certain you're comparing apples to apples and oranges to oranges-or chickens to chickens. And once these standard terms are in place, you can compare multiple ontologies to uncover new associations. Perhaps you'll be the first to notice that "smarter" chickens have a gene mutation that lengthens their beaks.
An ontology also establishes what biologists know about the objects they study. For example, a chicken:
- Is a domesticated animal used for food.
- Lays eggs if female.
- Cannot fly long distances.
In the same way, a particular gene may be tagged as "makes proteins that strengthen the cell wall" or "located on chromosome two."
Another benefit is that ontologies organize terms to show how objects and concepts relate to each other. Ontologists may depict these associations as a tree, a flow chart or the nested folder structure on your computer. These visuals make it easier to understand that a chicken is a kind of bird, cell division is a kind of biological process, and the cerebellum is part of the brain. Defining these relationships helps computer modelers incorporate them to bring simulations closer to reality.
That's the ideal, anyway. As ontologists try to wrangle all this scientific knowledge into tidy categories, they must clear major hurdles. For instance, researchers don't always agree on terminology. Nor do they necessarily have the same opinion on a protein's function or the connections between certain genes and human diseases. These roles aren't always clear, especially at the cutting edge of discovery. In fact, the sociology of ontology building—how to get communities to develop and agree on standards—is one of the most challenging and rewarding areas of research, says Peter Lyster, also of the Center for Bioinformatics and Computational Biology.
There probably will never be a single, undisputed ontology that contains all scientific knowledge. But that isn't the goal. Instead, it's to develop a series of ontologies that are useful to scientists in specialized fields and that are indexed in one place. It's also to convince scientists around the world that having these ontologies is not only helpful, it's essential. Overcoming these challenges won't be easy, but if ontologists succeed, you'll have an easier time getting started on the next breakthrough in chicken brain research.
This article also appears in Inside Life Science.