Fundamentally, the paradigm of predictive computational modeling encompasses all of the work in the Jain Lab. This ranges from modeling protein ligand interactions to modeling the relationship between variation in quantifiable molecular species to the behavior of complex biological systems. The primary areas of research in the lab are: 1) computational methods for structure-based drug design, 2) computational approaches for modeling human transcription and biological network structure, and 3) computational and statistical methods to derive quantitative conclusions from data gathered with high-throughput biological measurement technologies. All of the approaches share their roots in the use of empirically derived scoring functions and compute-intensive search, optimization, and enumeration.

Following a long period of applied research in defense applications and in speech understanding, Prof. Jain began a research career exclusively focused on issues in computational chemistry and computational biology. His foundational work in computer-aided drug design was done in industry, beginning with the Compass and Hammerhead techniques (see papers from 1994-1997). Compass involved a new representational scheme for capturing the 3D surface-properties of small molecules that made it possible to systematically address a previously unaddressed aspect in modeling the activity of small molecules: choice of the relative alignment and conformation (or pose) of competitive ligands including the detailed relationship of their hydrophobic shapes. A key insight, made with colleagues, was that the choice of pose should be directly governed by the function being used to predict binding affinity (essentially a direct analogy to physics where the lowest energy state is sought). The difficulty was that the function to predict activity was being induced at the same time as the pose choice. The Compass method overcame this problem, and was one of the foundational methods in establishing the field of multiple-instance learning, as it has come to be known within the Computer Science community. This work lead to the development of one of the first molecular docking programs described that addressed ligand conformational flexibility. The Hammerhead docking system built upon the molecular representations, multiple-instance approach, and search strategy developed for Compass.

Subsequent work built on the foundation laid by Compass and Hammerhead. These methods addressed problems in computation of molecular diversity and prediction of ADME properties (see papers from 1998-2000). Our most recent work in computational drug design (see the Surflex methodological papers from 2003 onward) is focused on pushing the frontiers of molecular docking and in constructing ligand-based models of protein active sites in cases where protein structure is unknown. The Surflex docking approach is unique, both with respect to scoring function and search methodology. Surflex-Dock is competitive with the best and most widely available methods in terms of docking accuracy, and is superior in terms of screening utility on publicly available benchmarks (measuring enrichment of known ligands in virtual screening experiments). We have recently made a substantial innovation to the multiple-instance parameter estimation process by generalizing our approach to now include negative training data. Putative inactive molecules have been added to a set of known active molecules in re-estimation of the scoring function for the Surflex docking method. We have continued our work in ligand-based modeling as well. The Surflex similarity method has been augmented, both in search strategy and in its objective function, to support the construction of ligand-based models of protein activity. The models are competitive with the best docking methods in terms of effectiveness in identifying novel ligands, generalizing remarkably well even across different chemical scaffolds.

Research within the lab has branched out to encompass larger biological scales, with research in transcription factor binding site modeling and promoter similarity moving to protein-DNA scales and predictive modeling efforts in biological networks addressing the protein-protein interaction scale. Note, however, that while our approaches in these areas do not encompass atomic detail, as in the protein-ligand area, they make use of exactly the same foundational computational approach: combination of empirically derived scoring functions with search and optimization. In particular, our promoter similarity work was directly analogous to the morphological similarity work with small molecules. Application of the idea to promoters revealed an unexpected link between repetitive elements and transcription (see Hon and Jain 2003). Similarly, our algorithm for transcription factor motif discovery is precisely analogous to the ligand-based modeling within Surflex-Sim. Results with the MaMF algorithm (see Hon and Jain, 2006) are superior to other widely available high-performing methods on publicly available benchmarks. At the scale of protein-protein interactions, our QPACA method (see Novak and Jain, 2006) seeks to recognize when a proposed collection of genes is part of a biological pathway and seeks to predict which new genes should be added to a specified pathway. The approach again combines scoring functions with optimization methods. In each of these areas, we have generalized an approach to predictive modeling, with success in making biologically relevant predictions.

The foregoing has addressed our research spanning protein-ligand, protein-DNA, and protein-protein interactions and interaction networks. We also have an interest at the organ and organismal scale. Our efforts in this area, relating variation in measurements of molecular species to phenotype or patient outcome are again based on compute-intensive, empirical, enumerative approaches. We were one of the first groups to publish a permutation-based approach to analysis of microarray data to address problems in this area (Jain et al. 2001, PNAS), and our continuing work in this area has led to a number of fruitful collaborations with experimentalists, evidenced by a large number of publications where the Jain lab is represented as co-authors in reports of applied cancer research.

 

All Rights Reserved 2008. http://www.jainlab.org