Probing structure of every protein is a massive job
All of us have considered at some time or another counting exceedingly large numbersthe stars in the sky, the grains of sand on a beach. But for Dartmouth computer scientist Bruce Donald, such a task is not a passing fancy but his life's work.
Goal: The ambitious goal of his current project is to work out the structures of all the proteins in natureboth plant and animal under the auspices of a five-year, $1.2-million grant from the Institute of General Medical Science.
It's rare for a computer scientist to be the principal investigator on a grant from the National Institutes of Health, says Donald, but he feels up to the challenge. When asked how many proteins might be involved, he replies, "Let's start with a human. The number of proteins is a function of the number of genes. That number keeps changing, but currently it's around 22,000.
"I'd be very surprised if the total [number of proteins for all species] is less than 100,000," Donald continues. "I would not be surprised if it was a million. But I would be surprised if it were more than two million."
How does one go about such a task? A good interdisciplinary team is the key. First, biochemists clone a gene, coax it to express a protein, and then purify the protein to the nth degree. The protein not only has to be pure, but each molecule has to fold in precisely the same way. The protein must then be dissolved in water and subjected to so-called solution nuclear magnetic resonance (NMR).
In this technique, the spectrometer makes tens of thousands of measurements of bond angles and distances between hydrogen nuclei. Just as numerous measurements by a surveyor go into the creation of a topographical map, the NMR data contains all the elements needed to prepare a three-dimensional map of the protein. The problem is how to extract them.
It is at this point that Donald's group steps in. Their work is based on an undergraduate honors thesis by Alik Widge, a 1999 Dartmouth College graduate who is now an M.D.-Ph.D. student at Carnegie-Mellon. With Donald's help, he formulated a computer algorithm for determining protein structure from the NMR data; his thesis won Dartmouth's Kemeny Computing Prize. The original algorithm has since been refined by Donald's group, and related algorithms have been developed. Now the group is testing them on NMR data collected from a variety of sources, at DMS and elsewhere.
Mass: The team is also probing protein structure using techniques complementary to NMR, including x-ray crystallography, mass spectrometry, and computational modeling.
The applications of the work are legion. For example, knowing the structure of a receptor could contribute to developing more specific drugs, more potent drugs, or drugs with fewer side effects. Or knowing the structure of an enzyme could suggest ways of modifying it to produce more efficient catalytic activity.
By looking at mass spectrometry data on serum from patients with prostate cancer and from normal controls, Donald and a colleague were able to construct what's called a "learning" algorithm that distinguishes with an accuracy of better than 97% between cancer patients and healthy patients. The results with ovarian cancer were even better100% accuracy.
Size: Another possible application may be measuring serum proteins during chemotherapy to quickly assess treatment outcomes. Now, oncologists must wait before they can evaluate a therapy's effectiveness. But changes in serum proteins may prove to be a more sensitive indicator of efficacy than, say, regression in tumor size.
To keep tabs on Donald's progress, check out this site www.cs.Dartmouth.edu/~brd/.
If you would like to offer any feedback about this article, we would welcome getting your comments at DartMed@Dartmouth.edu.