History

Following a long period of applied research in defense applications and in speech understanding, Prof. Jain began a research career exclusively focused on issues in computational chemistry and computational biology. His foundational work in computer-aided drug
design was done in industry, beginning with the Compass and Hammerhead techniques. Compass involved a new representational scheme for capturing the 3D surface-properties of small molecules that made it possible to systematically address the relative alignment and conformation (or pose) of competitive ligands. A key insight, made with colleagues, was that the choice of pose should be directly governed by the function being used to predict binding affinity. The difficulty was that the function to predict activity was being induced at the same time as the pose choice. The Compass method overcame this problem, helping to establish the field of multiple-instance learning, as it has come to be known within the Computer Science community. This work lead to the development of one of the first flexible molecular docking programs (Hammerhead) and then to a broadening set of research areas in predictive molecular modeling.

Basic Tools

Small Molecule Energetics and Conformations


 

A continuing frustration with the quality of available tools lead to a new focus on basic aspects of small-molecule preparation:

  • 2D to 3D conversion (from SMILES or SDF)
  • Chirality detection and enumeration
  • Protonation
  • Conformer generation

Our approaches are different in that they are:

  • Template-free and non-stochastic
  • Purely reliant on a forcefield for structure generation
  • Accurate on typical drug-like ligands, with better coverage of diverse conformations, without fitting to known libraries of structures
  • Unusually fast and accurate for macrocyclic ligands
  • Capable of incorporating NMR restraints, which is particularly useful for large peptidic macrocycles

Selected Publications

Jain, A. N., Cleves, A. E., Gao, Q., Wang, X., Liu, Y. Sherer, E. C., and Reibarkh, M. Y. (2019). Complex macrocycle exploration: Parallel, heuristic, and constraint-based conformer generation using ForceGen. JCAMD, 33(6), 531-558. Open Access

Cleves, A. E. and Jain, A. N. (2017). ForceGen 3D structure and conformer generation: From small lead-like molecules to macrocyclic drugs. JCAMD, 31(5), 419-439. Open Access

 

The following recent studies all depended on the FGen3D and ForceGen Methods:

Cleves, A. E. and Jain, A. N. (2020). Structure-Based and Ligand-Based Virtual Screening on DUD-E+: Performance Dependence on Approximations to the Binding-Pocket. JCIM. Open Access

Cleves, A. E., Johnson, Stephen R., and Jain, A. N. (2019). Electrostatic-field and surface-shape similarity for virtual screening and pose prediction. JCAMD, 33, 865-886. Open Access

Cleves, A. E. and Jain, A. N. (2018). Quantitative Surface Field Analysis: Learning Causal Models to Predict Ligand Binding Affinity and Pose. JCAMD, 32, 731-757. Open Access

 

Molecular Similarity

Surfaces, Electrostatics, and Hydrogen Bonding


 

There are three key areas of application for 3D ligand similarity methods:

  • Virtual screening
  • Pose prediction
  • Multiple ligand alignment

Our approaches are different in that they:

  • Are based on a fundamentally different representation of molecular shape; we employ molecular surfaces
  • Directly compare Coulombic electrostatic fields
  • Model hydrogen bonding in a directionally sensitive manner 
  • Are extremely fast: screening speeds of over 20 million compounds per day on a single computing core
  • Are integrated into other methods both directly and by using a general probabilistic approach related to ideas in statistical physics

Selected Publications

Cleves, A. E., Johnson, Stephen R., and Jain, A. N. (2019). Electrostatic-field and surface-shape similarity for virtual screening and pose prediction. JCAMD, 33, 865-886. Open Access

Cleves, A. E. and Jain, A. N. (2018). Quantitative Surface Field Analysis: Learning Causal Models to Predict Ligand Binding Affinity and Pose. JCAMD, 32, 731-757. Open Access

Cleves, A. E. and Jain, A. N. (2015). Chemical and Protein Structural Basis for Biological Crosstalk Between PPAR-alpha and COX Enzymes. JCAMD, 29(2),101-112. Open Access

Yera, E. R., Cleves, A. E., & Jain, A. N. (2014). Prediction of off-target drug effects through data fusion. In Pacific Symposium on Biocomputing (Vol. 19, pp. 160-171). Open Access

Yera, E.R., Cleves, A.E., and Jain, A.N. (2011) Chemical Structural Novelty: On-Targets and Off-Targets. J Med Chem, 64: 6771-6785. Open Access

Cleves, A. E. and Jain, A.N. (2006). Robust Ligand-Based Modeling of the Biological Targets of Known Drugs. J Med Chem, 49, 2921-2938

Cleves, A.E. and Jain, A.N. (2008). Effects of Inductive Bias on Computational Evaluations of Ligand-Based Modeling and on Drug Discovery. JCAMD, 22, 147-159

Jain, A.N. (2004). Ligand-Based Structural Hypotheses for Virtual Screening. J Med Chem. 47, 947-961.

Jain, A.N. (2000). Morphological similarity: A 3D molecular similarity method correlated with protein-ligand recognition. JCAMD 14, 199-213.

Docking and Binding Site Analysis

Empirical Scoring Functions, Search. and Optimization


 

We are interested in all aspects of docking and binding-site analysis:

  • Large-scale PDB retrieval and processing
  • Real-space refinement to model ligand binding as conformational ensembles
  • Surface-based binding site alignment using the PSIM method and optimal pocket variant selection
  • Virtual screening
  • Pose prediction

Our approaches are different in that they are:

  • Fully automated for both alignment and selection of appropriate binding site variants
  • Robust using fully automatic modes for virtual screening and pose prediction
  • Very extensively validated on public benchmarks, both independently and within our lab
  • Highly accurate for non-cognate ligand docking, employing a hybrid approach using ligand similarity
  • Directly applicable to synthetic macrocycles, with accuracy similar to that see for non-macrocycles

Selected Publications

Jain, A.N., Brueckner, A.C., Cleves, and Reibarkh, M.Y., and Sherer, E.C. (2023). A Distributional Model of Bound Ligand Conformational Strain: From Small Molecules up to Large Peptidic Macrocycles. JMC. Open Access

Jain, A.N., Cleves, A.E., Brueckner, A.C., Lesburg, C.A., Deng,Q., Sherer, E.C., and Reibarkh, M.Y. (2020). XGen: Real-Space Fitting of Complex Ligand Conformational Ensembles to X‐ray Electron Density Maps. JMC. Open Access

Cleves, A. E. and Jain, A. N. (2020). Structure-Based and Ligand-Based Virtual Screening on DUD-E+: Performance Dependence on Approximations to the Binding-Pocket. JCIM. Open Access

Cleves, A. E. and Jain, A. N. (2015). Knowledge-Guided Docking: Accurate Prospective Prediction of Bound Configurations of Novel Ligands using Surflex-Dock. JCAMD, 29(6), 485-509. Open Access

Cleves, A. E. and Jain, A. N. (2015). Chemical and Protein Structural Basis for Biological Crosstalk Between PPAR-alpha and COX Enzymes. JCAMD, 29(2),101-112. Open Access

Spitzer, R., Cleves, A. E., Varela, R., and Jain, A. N. (2013). Protein function annotation by local binding site surface similarity. Proteins: Structure, Function, and Bioinformatics. Open Access

Spitzer, R., and Jain, A.N. (2012). Surflex-Dock: Docking Benchmarks and Real-World Application. JCAMD, 26: 687-699.

Spitzer, R., Cleves, A.E., and Jain, A.N. (2011) Surface-Based Protein Binding Pocket Similarity. Proteins, 79: 2746-2763.

Spitzer, R., and Jain, A.N. (2012). Surflex-Dock: Docking Benchmarks and Real-World Application. JCAMD, 26: 687-699.

Jain, A.N. (2009). Effects of Protein Conformation in Docking: Improved Pose Prediction Through Protein Pocket Adaptation. JCAMD, 23: 355-374.

Jain, A.N. (2008). Bias, Reporting, and Sharing: Computational Evaluations of Docking Methods. JCAMD, 22, 201-212.

Pham, T. A. and Jain, A.N. (2008). Customizing Scoring Functions for Docking. JCAMD, 22, 269-286.

Ruppert, J., Welch, W. & Jain, A.N. (1997). Automatic identification and representation of protein binding sites for molecular docking. Protein Sci 6, 524-33.

Welch, W., Ruppert, J. & Jain, A.N. (1996). Hammerhead: Fast, fully automated docking of flexible ligands to protein binding sites. Chem Biol 3, 449-62.

Jain, A.N. (1996). Scoring noncovalent protein-ligand interactions: A continuous differentiable function tuned to compute binding affinities. J Comput Aided Mol Des 10, 427-40.

Affinity Prediction

Machine-Learning Approaches for Predicting Binding Affinity and Pose


 

Affinity prediction is an extremely challenging area, especially when target structures are not known. Our research employs machine-learning (not direct physics-based simulation) toward the following subtasks:

  • Multiple ligand alignment for molecular series that include multiple scaffolds
  • Incorporation of known binding site information
  • Binding site model induction using a multiple-instance learning approach
  • Prediction of both binding affinity and binding mode of new ligands
  • Iterative refinement of models with new data

Our approaches are different in that:

  • They implement fully automatic approaches for model building, including all aspects of ligand conformation and alignment
  • The binding site model (a “pocket-field”) is analogous to a flexible protein binding site
  • The induced pocket-field identifies which pose a new molecule must adopt, and ligand strain is directly modeled
  • Models produce formal estimates of prediction confidence and molecular novelty
  • Very detailed aspects of molecular surface shape, directional hydrogen bonding preferences, and Coulombic electrostatics are learned

Selected Publications

Cleves, A. E., Johnson, S. R., and Jain, A. N. (2021). Synergy and Complementarity between Focused Machine Learning and Physics-Based Simulation in Affinity Prediction. JCIM.Open Access

Cleves, A. E. and Jain, A. N. (2018). Quantitative Surface Field Analysis: Learning Causal Models to Predict Ligand Binding Affinity and Pose. JCAMD, 32, 731-757. Open Access

Cleves, A. E. and Jain, A. N. (2016). Extrapolative prediction using physically-based QSAR. JCAMD, 30(2), 127-152. Open Access

Varela, R., Walters, W. P., Goldman, B. B., and Jain, A. N. (2012). Iterative Refinement of a Binding Pocket Model: Active Computational Steering of Lead Optimization. Journal of Medicinal Chemistry, 55(20), 8926-8942. Open Access

Varela, R., Cleves, A. E., Spitzer, R., and Jain, A. N. (2013). A structure-guided approach for protein pocket modeling and affinity prediction. JCAMD, 27(11), 917-934. Open Access

Jain, A.N., and Cleves, A.C. (2012). Does Your Model Weigh the Same as a Duck? JCAMD, 26, 57-67.
Jain, A.N. (2010). QMOD: Physically Meaningful QSAR. JCAMD, 24, 865-878. Open Access

Langham, J.J., Cleves, A.E., Spitzer, R., Kirshner, D., and Jain, A.N. (2009). Physical Binding Pocket Induction for Affinity Prediction. J Med Chem, 52: 6107-6125.

Cleves, A.E. and Jain, A.N. (2008). Effects of Inductive Bias on Computational Evaluations of Ligand-Based Modeling and on Drug Discovery. JCAMD, 22, 147-159

Cleves, A. E. and Jain, A.N. (2006). Robust Ligand-Based Modeling of the Biological Targets of Known Drugs. J Med Chem, 49, 2921-2938

Jain, A.N., Harris, N.L. & Park, J.Y. (1995). Quantitative binding site model generation: Compass applied to multiple chemotypes targeting the 5-HT1A receptor. J Med Chem 38, 1295-308.

Jain, A.N., Koile, K. & Chapman, D. (1994). Compass: Predicting biological activities from molecular surface properties. Performance comparisons on a steroid benchmark. J Med Chem 37, 2315-27.

Jain, A.N., Dietterich, T.G., Lathrop, R.H., Chapman, D., Critchlow, R.E., Jr., Bauer, B.E., Webster, T.A. & Lozano-Perez, T. (1994). A shape-based machine learning tool for drug design. JCAMD 8, 635-52.