| 
Fundamentally, the paradigm of predictive computational modeling
encompasses all of the work in the Jain Lab. This ranges from
modeling protein ligand interactions to modeling the relationship
between variation in quantifiable molecular species to the
behavior of complex biological systems. The primary areas of
research in the lab are: 1) computational methods for
structure-based drug design, 2) computational approaches for
modeling human transcription and biological network structure,
and 3) computational and statistical methods to derive
quantitative conclusions from data gathered with high-throughput
biological measurement technologies. All of the approaches share
their roots in the use of empirically derived scoring functions
and compute-intensive search, optimization, and enumeration.
Following a long period of applied research in defense applications
and in speech understanding, Prof. Jain began a research career
exclusively focused on issues in computational chemistry and
computational biology. His foundational work in computer-aided drug
design was done in industry, beginning with the Compass and Hammerhead
techniques (see papers from 1994-1997). Compass involved a new
representational scheme for capturing the 3D surface-properties of
small molecules that made it possible to systematically address a
previously unaddressed aspect in modeling the activity of small
molecules: choice of the relative alignment and conformation (or pose)
of competitive ligands including the detailed relationship of their
hydrophobic shapes. A key insight, made with colleagues, was that the
choice of pose should be directly governed by the function being used
to predict binding affinity (essentially a direct analogy to physics
where the lowest energy state is sought). The difficulty was that the
function to predict activity was being induced at the same time as the
pose choice. The Compass method overcame this problem, and was one of
the foundational methods in establishing the field of
multiple-instance learning, as it has come to be known within the
Computer Science community. This work lead to the development of one
of the first molecular docking programs described that addressed
ligand conformational flexibility. The Hammerhead docking system built
upon the molecular representations, multiple-instance approach, and
search strategy developed for Compass.
Subsequent work built on the foundation laid by Compass and
Hammerhead. These methods addressed problems in computation of
molecular diversity and prediction of ADME properties (see papers
from 1998-2000). Our most recent work in computational drug
design (see the Surflex methodological papers from 2003 onward)
is focused on pushing the frontiers of molecular docking and in
constructing ligand-based models of protein active sites in cases
where protein structure is unknown. The Surflex docking approach
is unique, both with respect to scoring function and search
methodology. Surflex-Dock is competitive with the best and most
widely available methods in terms of docking accuracy, and is
superior in terms of screening utility on publicly available
benchmarks (measuring enrichment of known ligands in virtual
screening experiments). We have recently made a substantial
innovation to the multiple-instance parameter estimation process
by generalizing our approach to now include negative training
data. Putative inactive molecules have been added to a set of
known active molecules in re-estimation of the scoring function
for the Surflex docking method. We have continued our work in
ligand-based modeling as well. The Surflex similarity method has
been augmented, both in search strategy and in its objective
function, to support the construction of ligand-based models of
protein activity. The models are competitive with the best
docking methods in terms of effectiveness in identifying novel
ligands, generalizing remarkably well even across different
chemical scaffolds.
Research within the lab has branched out to encompass larger
biological scales, with research in transcription factor binding
site modeling and promoter similarity moving to protein-DNA
scales and predictive modeling efforts in biological networks
addressing the protein-protein interaction scale. Note, however,
that while our approaches in these areas do not encompass atomic
detail, as in the protein-ligand area, they make use of exactly
the same foundational computational approach: combination of
empirically derived scoring functions with search and
optimization. In particular, our promoter similarity work was
directly analogous to the morphological similarity work with
small molecules. Application of the idea to promoters revealed an
unexpected link between repetitive elements and
transcription (see Hon and Jain 2003). Similarly, our algorithm
for transcription factor motif discovery is precisely analogous
to the ligand-based modeling within Surflex-Sim. Results with the
MaMF algorithm (see Hon and Jain, 2006) are superior to other
widely available high-performing methods on publicly available
benchmarks. At the scale of protein-protein interactions, our
QPACA method (see Novak and Jain, 2006) seeks to recognize when a
proposed collection of genes is part of a biological pathway and
seeks to predict which new genes should be added to a specified
pathway. The approach again combines scoring functions with
optimization methods. In each of these areas, we have generalized
an approach to predictive modeling, with success in making
biologically relevant predictions.
The foregoing has addressed our research spanning protein-ligand,
protein-DNA, and protein-protein interactions and interaction
networks. We also have an interest at the organ and organismal
scale. Our efforts in this area, relating variation in
measurements of molecular species to phenotype or patient outcome
are again based on compute-intensive, empirical, enumerative
approaches. We were one of the first groups to publish a
permutation-based approach to analysis of microarray data to
address problems in this area (Jain et al. 2001, PNAS), and our
continuing work in this area has led to a number of fruitful
collaborations with experimentalists, evidenced by a large number
of publications where the Jain lab is represented as co-authors
in reports of applied cancer research.
|