Dihydrofolate reductase (DHFR) catalyzes the NADPH-dependent reduction of dihydrofolate to tetrahydrofolate. As the only source of tetrahydrofolate (an important precursor in the biosynthesis of purines, thymidylate, and several amino acids), it has been a long-standing anti-cancer target and a classic system for structure-based drug design (SBDD). Escherichia coli DHFR (ecDHFR) is a canonical system for studying enzyme structure, dynamics, and catalysis. Protein flexibility and dynamics are of utmost importance in understanding the structure and mechanism of DHFR. This has been well investigated computationally and experimentally. The conformation of the M20 loop is particularly important to the catalytic cycle, as its three major conformations (open, closed, and occluded) are known to regulate ligand affinity and turnover.
Traditional SBDD techniques focus on static structures. In 1999, Carlson and coworkers introduced the MPS (multiple protein structure) method as a way of incorporating protein flexibility into SBDD. The extreme importance of flexibility for DHFR makes the MPS method particularly appropriate. To improve the method, I developed new techniques for flooding and automatically clustering the solvent-mapping probes used in the procedure. I generated models from simulations starting with the M20 loop in both open and closed conformations. The MPS models preferentially identified high-affinity inhibitors over drug-like non-inhibitors.
Proteins are dynamic structures, and we have shown that the incorporation of these dynamics significantly enhances drug design efforts [1,2]. These techniques yield exquisite precision, and we are even able to produce models that preferentially select drugs that inhibit E. coli DHFR in comparison to human DHFR, yielding many possibilities for future anti-bacterial studies.
Although they were originally thought to require direct human biochemical expertise and intervention, I have shown that the clustering methods involved in MPS drug design can be fully automated . Future projects include the application of the MPS method to additional systems, integration of computational topology techniques, as well as automation of the entire MPS workflow.
Databases of bound protein-ligand complexes are useful for everything from developing scoring functions for ligand docking to de novo design of enzyme inhibitors to the examination of the basic biophysical properties of protein-ligand interaction. Binding MOAD  is largest collection of high-quality protein-ligand complexes available from the Protein Data Bank (PBD). It is extensively hand-curated, including binding data (Ki, Kd, IC50) from all available primary literature sources.
Some proteins are relatively easy to crystalize. Others are known systems of great interest, and still others have come in and out of vogue over time. A straightforward analysis of biophysical properties based on all available PDB structures will be heavily biased towards such systems and thus misleading. Therefore, significant effort has gone into the production of an annotated, non-redundant dataset.
Future projects include the large-scale mining of Binding MOAD to answer basic biophysical questions related to protein-ligand binding, as well as more specific questions related to computational drug design.
In collaboration with Anthony Bak at Stanford and undergraduates at Earlham College, I am applying computational topological techniques to address these same questions without the need for complex biochemical input.