Skip to main content

Marriage of Math and Genetics Forges New Scientific Landscape -- Part 1 of 2

A computer scientist and geneticist team up to produce a list of genes implicated in diseases.

Researchers (left to right) Randy Jirtle, Philippe Luedi and Alex Hartemink used the computational power of computers from the Duke Shared Cluster Resource to identify genes likely to contribute to diseases.

When a team of Duke researchers published a list of genes likely to contribute to human ailments ranging from Alzheimer's and autism to diabetes and obesity, it shed light not only on diseases that afflict millions of people but also on how research that may lead to new treatments and cures is changing across the university.

The change is revealed in the diverse backgrounds of Duke professors Randy Jirtle and Alexander Hartemink and graduate student Philippe Luedi, who published the June 2005 study in Genome Research: Jirtle is a well-established geneticist. Hartemink is a computer scientist and former Rhodes Scholar. Luedi's training in bioinformatics stands at the intersection of the two disciplines.

Such a marriage of computation and biology is becoming increasingly common at Duke, reflecting a broader shift towards research across familiar scientific boundaries.

Jirtle and Hartemink describe their collaboration as leading to a sort of genetic "treasure map." The treasures are imprinted genes, valuable as candidates for disease-causing genetic abnormalities but buried among the tens of thousands of genes in the human genome.

Alex Hartemink
Professor Alex Hartemink explains how the computer science method of "machine learning" can be applied to the genetics problem of identifying imprinted genes. Listen to the audio.

"If I tell you there's an island the size of Greenland, and I have buried 600 treasure chests somewhere on the island, you know nothing," Hartemink explains. "We've identified genetic regions, or parts of the landscape, that are more likely to be where the "treasure' of imprinted genes is buried. In that sense it's like a treasure map."

Their map, Jirtle says, reduces from tens of thousands to only 600 the number of genes likely to be imprinted. "I can handle 600," he says. "I can't handle 25,000."

For years, Jirtle had been investigating these curious genes whose pattern for turning on and off differs from the dominant/recessive model of classical genetics. During the formation of the sperm and egg, a molecular process called methylation imprints the genes with a mark that silences the copy coming from either the mother or father. The silenced copy of the gene is unavailable to compensate for possible flaws in the active copy, including flaws that may lead to disease.

In his lab, Jirtle had identified imprinted genes in mice and sheep and the corresponding genes in humans. In one project, his lab determined that a rare characteristic in sheep -- unusually big and muscular bottoms -- was caused by an imprinted gene. Such discoveries of individual imprinted genes were the essential first step to understanding the genes' broader role in diseases. But Jirtle found it prohibitively difficult to identify them more widely across the entire genome.

Meanwhile, Hartemink, who graduated Duke in 1994 with degrees in mathematics, physics and economics, had returned to campus as a computer science professor. In collaborations with biology researchers, he was exploring ways the computer science method of "machine learning" could be used to find patterns in biological systems.

In the spring of 2003, Hartemink and Jirtle began exploring how they might work together. Hartemink was teaching a course on "computational functional genomics" and asked the students whether anyone wanted to help him and Jirtle tackle the challenge of analyzing vast sequences of genetic information and other biological data to identify imprinted genes.

table 1
"In Table 1 we present a set of human genes whose mouse homologs are predicted to be imprinted and which map to regions in the human genome that are linked to complex conditions with parent-of-origin-dependent inheritance," write Luedi, Jirtle and Hartemink in their Genome Research paper. Table (shown in part) reproduced with permission from Cold Spring Laboratory Press copyright 2005, Luedi et. al. 2005.

"I knew pretty much from the beginning I wanted to do it," Luedi said about choosing the project. "It combined my interest in sequence analysis and statistics."

Working with Jirtle, Luedi compiled data on mouse genes that are known to be imprinted. Then, working with Hartemink, he came up with an algorithm that could distinguish imprinted genes from non-imprinted ones.

"It involves a classification of "yes' or "no' -- "imprinted' or "not imprinted,'" he said. "The statistics community calls that "regression;' computer science calls it "machine learning.'"

Next, they ran the algorithm against the entire mouse genome, most of whose genes' imprinted status is unknown. That analysis identified 600 likely-imprinted genes out of 23,788.

In the final step, the researchers looked for human genes that are located in regions of the human genome believed to contribute to certain diseases and that correspond to the mouse genes they predicted to be imprinted (see table).

The result is a paper that opens new avenues for scientists to decode the mysteries of imprinted genes, which have become the focus of intense scientific inquiry worldwide. "These collaborations are really truly collaborations because neither group would have been able to pull it off alone," Jirtle said.

Hartemink, Jirtle and Luedi are not the only scientists at Duke combining genetics experiments and computational methods. Part two of this story explores how the merger of math and genetics affects how young scientists are trained.