Blue Waters User Portal | Science Teams

Machine Learning Harnesses Molecular Dynamics to Develop Therapeutic Strategies for Alzheimer's Disease and Chronic Pain

Evan Feinberg, Ohio Supercomputer Center

Usage Details

Steven Gordon, Evan Feinberg

Most FDA-approved drugs are small organic molecules that elicit a therapeutic response by binding to a target biological macromolecule. Once bound, small molecule ligands either inhibit the binding of other ligands or allosterically adjust the target’s conformational ensemble. Binding is thus crucial to any behavior of a therapeutic ligand. To maximize a molecule’s therapeutic effect, its affinity—or binding free energy—for the desired targets must be maximized while simultaneously minimizing its affinity for other macromolecules. Historically, scientists have used both cheminformatics- and structure-based approaches to model ligands and their targets, and most machine learning approaches use domain expertise-driven features.

More recently, deep neural networks (DNNs) have been translated to the molecular sciences. Training most conventional DNN architectures requires vast amounts of data. For example, ImageNet currently contains over 14 million labeled images. In contrast, the largest publicly available data sets for the properties of drug-like molecules include PDBBind 2017, with a little over 4, 000 samples of protein–ligand co-crystal structures and associated binding affinity values; Tox21 with nearly 10, 000 small molecules and associated toxicity endpoints; QM8 with around 22,000 small molecules and associated electronic properties; and ESOL with a little over 1, 000 small molecules and associated solubility values. This scarcity of high-quality scientific data necessitates innovative neural architectures for molecular machine learning.