Drug Discovery - Systems Modeling and Data Analytics Core

New drug development can be divided into two phases: (1) discovery, which tends to be chemistry-centric, and (2) translation, which focuses on animal and clinical therapeutic responses. The figure to the right illustrates how few new drugs make the journey from initial drug lead discovery to final approval in humans as a “Drug Discovery Funnel.” During the first discovery phase, only 4% of initial drug leads progress to early animal work. SMDA’s Drug Discovery working group aims to increase success rates in early drug screening and optimization by leveraging large-scale drug datasets and modifying in vitro validation assays to improve clinical validity.

In the second translational phase, a mere 0.2% of drug leads are approved for use in humans. SMDA’s Translational working group aims to increase success rates in clinical translation through modeling drug response (PK-PD) and better matching drug leads to the patient populations who will benefit from them (diagnostics).

Overall, these two working groups strive to fulfill a major strategic goal of SMDA and POH: to better integrate research across the drug development process to increase success rates and reduce drug development costs. This page provides an overview of SMDA’s Drug Discovery working group’s current priority areas and shared resources.

Artificial Intelligence in Drug-Discovery: Drug-centric Datasets

In drug discovery and development, five major drug-centric data types are driving efforts to leverage artificial intelligence for improved drug discovery. The first data type is the Chemical Properties of Drugs and drug-like molecules. Currently, approximately 10,000 drugs are used clinically in humans, and around 100 million small molecules can be evaluated computationally for their chemical similarity to drug-like molecules. Clinical endpoints that best correlate with drug properties include absorption (e.g., intestines), distribution to different body compartments, metabolism by liver enzymes, and excretion via feces and urine. Chemical properties can also be major determinants of toxicity; for example, aromatic and electrophilic molecules can have genotoxic effects on cells.

The second major type of drug data is Drug Targets Affinities. Approximately 2,000 “druggable” genes exist within the human genome of about 20,000 protein-coding genes. Only around 10% of the human genome is considered druggable because traditional small molecule drugs require a small “binding pocket” to fit via a “lock and key” mechanism. Typical drug-target data types include relative binding affinities from both virtual and experimental screening methods with purified proteins.

Cellular signaling pathways are another critical dataset for understanding drug response in vitro, as they bridge the physical binding of a drug to its target with the downstream therapeutic response (e.g., cell killing for bacteria or cancer). Cellular Drug-response data is the most complex form of drug data available before animal work. Typically, the response of bacterial or cancer cells to around 10,000 drugs can be evaluated using robotic technology that automates traditional drug-response assays conducted in the laboratory.

Tactical Priorities

Recruit Workforce: with interests in medicinal chemistry and chemo-informatics
Train Workforce: to use emerging tools of
- chemoinformatics
- data-science
- artificial intelligence
Build Collaborations: between relevant disciplines
- computational chemists
- medicinal chemists
- bioinformaticians
- pharmacologists
Automate: laborious and iterative processes to:
- increase efficiency of research
- increase access to advanced computational methods
Map Community Datasets:
- chemoinformatics
- virtual screening
- drug-target affinity (e.g. kinome-scan)
- high throughput screening
- clinical drug data

Strategic Priorities

Accessible Chemo-informatics: to facilitate
- Quantitative Structure Activity Relationship (QSAR) studies
- Molecular Probe Design (e.g. fluorophores, activity probes)
Accessible Virtual Screening: to facilitate
- ab initio drug design
- natural ligand discovery
- drug-lead target identification
Accessible High Throughput Screening: to facilitate
- drug-lead discovery for specific disease models
- structural activity relationship (SAR) studies for specific drug-leads
- disease model studies for pre-existing SAR-libraries at UGA
Accessible Pharmacokinetic-Toxicity Analysis:
- computational methods
- in vitro methods
- in vivo methods

Current Members

Eugene Douglass
Uma Singh
Robert Huigens
Jonathan Mochel
Karin Allenspach
Natarajan Kannan
Steve Maher
George Zheng

Software Tools

…

Cleaned Datasets

…

Key Performance Indicators: summary statistics

KPI Category	KPI	Current Value
Research and Publications	Number of Published Papers
	Impact Factor of Journals
	Citations
	Conference Presentations
Collaboration and Engagement	Interdisciplinary Projects
	External Collaborations
	Collaborative Publications	%
	Workshops and Seminars
Funding and Grants	Research Grants Received	$
	Grant Applications Submitted
	Grant Success Rate	%
Data and Tools	Datasets Published
	Software Tools Developed
	Tool Adoption	downloads
Training and Development	Students Supervised
	Training Programs
	Skill Development	certificates
Impact and Outreach	Societal Impact
	Media Mentions
	Public Engagement	events
Operational Efficiency	Project Completion Rate
	Data Management Practices	% compliance
	Resource Utilization	% efficiency
Innovation and Excellence	Awards and Recognitions
Innovation and Excellence	Innovative Solutions	breakthroughs
Feedback and Improvement	Stakeholder Feedback	satisfaction
Feedback and Improvement	Continuous Improvement	# iterations

Key Performance Indicators: specific items list

Protected Content

Research and Publications
- Number of Published Papers
- Impact Factor of Journals
- Citations
- Conference Presentations
Collaboration and Engagement
- Interdisciplinary Projects
- External Collaborations
- Collaborative Publications
- Workshops and Seminars
Funding and Grants
- Research Grants Received
- Grant Applications Submitted
- Grant Success Rate
Data and Tools
- Datasets Published
- Software Tools Developed
- Tool Adoption
Training and Development
- Students Supervised
- Training Programs
- Skill Development
Impact and Outreach
- Societal Impact
- Media Mentions
- Public Engagement
Operational Efficiency
- Project Completion Rate
- Data Management Practices
- Resource Utilization
Innovation and Excellence
- Awards and Recognitions
- Innovative Solutions
Feedback and Improvement
- Stakeholder Feedback
- Continuous Improvement

…