Genome Biology

The essentially complete decoding of the entire human genome sequence in 2003 was a landmark event. However, one could argue that sequencing was the easy part. Now is when the really hard work begins: what do these three billion base pairs mean? We know that the overwhelming majority do not code for proteins. Are they mostly just "junk" left over from millions of years of evolution? The availability of large amounts of genome sequence allows us to ask other questions, too: how do our cells manage genomic information? What sorts of signals do they use that are not dictated by our DNA sequence alone? How do less than 25,000 genes produce the 80,000 or more proteins found in our bodies?

At Duke, researchers are taking a variety of approaches to these problems. One is through "epigenomics," the study of chemical modifications of genes that are passed on from one cell generation to the next and affect gene expression, yet do not alter the DNA sequence itself. Such chromatin modifications, found in organisms ranging from yeast to human, often inhibit gene expression; such selective inhibition explains why most genes residing on a woman' s two X chromosomes are expressed from only one copy of the X. These modifications have also been implicated in a number of diseases, from cancer to birth defects and heart disease.

Genome biologists in the IGSP are also exploring how nature has engineered genes and cells in ways that are analogous to computer programs. Part of their work involves reverse engineering: programming cancer cells to commit suicide, or healthy cells to make drugs or deliver therapies. Other investigators are studying the myriad ways in which the human genome is organized to control gene expression, chromosome behavior and RNA splicing. For example, both laboratory and computational experiments at Duke are aimed at determining how a process called alternative splicing prunes RNA molecules into multiple forms each capable of giving rise to a distinct protein.