CSAR and Binding MOAD: Two different databases, - PowerPoint Presentation

sherrill-nordquist . @sherrill-nordquist

347 views
Uploaded On 2018-11-03

CSAR and Binding MOAD: Two different databases, - PPT Presentation

two different aims one common goal provide the best proteinligand data James B Dunbar Jr and Heather A Carlson 5th Meeting on US Government Chemical Databases and Open Chemistry August 25 ID: 711570

crystal data selection binding data crystal binding selection csar moe bond acc dataset properties structure smiles process structures ligand

Link:

Copy

Embed:

<iframe width="560" height="315" src="https://www.docslides.com/embed/711570" frameborder="0" allowfullscreen></iframe>

Download Presentation from below link

Download Presentation The PPT/PDF document "CSAR and Binding MOAD: Two different da..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation Transcript

Slide1

CSAR and Binding MOAD: Two different databases,two different aims, one common goal provide the best protein-ligand dataJames B. Dunbar Jr. and Heather A. Carlson5th Meeting on U.S. Government Chemical Databases and Open ChemistryAugust 25th and 26th 2011Slide2

Binding MOADWho are we : Heather Carlson – Principle Investigator Mark L. Benson Richard Smith Nickolay Khanazov Leigi Hu Michael Lerner John Beaver Brandon Dimcheff Jason Nerothin

Jayson Falkner

Peter

Dresslar

James Dunbar Jr.Slide3

Binding MOADSlide4

Binding MOADSlide5

Binding MOADSlide6

Binding MOADHTMLGATE NLP programBUDA

Tagged HTML + annotated XML => Scores + text highlights

Web App used to aid in manual

biodata

extraction and

curation

For 2010 update: ~1200 manuscripts to review manually for data

~2800 new PDB structuresSlide7

Binding MOADBUDASlide8

Binding MOADTime consuming steps:Obtaining the html from the journals – format changesRandom – different every yearHand curation of data within BUDACorrect data for compoundCorrect sequence for the crystalBUDA – essential for bookkeeping of the curation processAllows for multiple people to work on the curationKeeps track of changes and comments on a per user basisStores all in MySQL – records all work done over the yearsOrders manuscripts by likelihood of data to topSlide9

CSAR Specific Aims SA1. Build the largest, high-quality, freely accessible database of protein-ligand complexes with experimentally determined binding affinities from literature. SA2. Generate new experimental data: We propose experimentally determining the dissociation constants (Kds) for selected protein-ligand complexes using two complementary techniques: isothermal calorimetry (ITC) and surface plasmon resonance (SPR). Consistency between the two approaches would provide confidence in the data. Furthermore, important physicochemical properties for the ligands will be determined (logP/logD, pKa, and solubility), and additional crystal structures will be solved. (Note: actually using ITC, Octet Red, and ThermoFluor – Wuxi

Apptec

is measuring the properties)

SA3. Curate data from the community

SA4. Community outreach

CSAR

Community Structure-Activity Resource

CSARdock.orgSlide10

Who are we : Principal Investigators Heather Carlson Jason Gestwicki Jeanne Stuckey Shaomeng Wang Researchers William Clay Brown Krishnapriya Chinnaswamy James Delproposto James Dunbar Jr. Emilio Esposito

You-Na Kang

Ginger

Kubish

Richard Smith

Kelly

Damm-Ganamet

Who are we (

cont.):

Consultants

Philip Andrews

Charles Brooks

III

Hollis Showalter

Janet Smith

Web Programming

Shelly

Yang

System Administration

Allen Bailey

Advisory Board

Michael Gilson

Philip

Hajduk

Paul

Labute

Deborah

Loughney

Anthony Nicholls

Tudor

Oprea

Catherine

Peishoff

Peter

Preusch

Alexander

Tropsha

Janna

Wehrle

CSAR – the people Slide11

CSAR – dataset exampleSlide12

CSAR – compound propertiesSlide13

CSAR – crystallography Slide14

Abbott4 datasets in progress Genentech Signed CDA5 datasets in progress GSKSigned CDARoche1 dataset deposited BMS and PfizerCDA in for legal review In-houseCDK2 (done), CDK2/

cyclinA

(final stages),

Lpxc

(ongoing),

urokinase

and Hsp90 (initial stages)

Industrial and In-house effortsSlide15

Dataset Selection Process (1)Analyze target – does is it have crystal structures and have defined series (expect ~ 3 or so series) with appropriate biological data?Analyze crystal data – does each have sufficient data present to refine the density? (.cv, .mtz, scale.log, …) – if so collect into a directoryObtain biological data on all compounds tested in the relevant assay and any applicable counter screens (Ki, Ka, IC50, - no %inhibition)Export from corporate database the:Structure (smiles)Company identifierBiological data for screens – including those in crystal structuresSplit data into three types: actives, inactives, crystal structures.For crystal structures obtain any PDB ids if available.Slide16

For the ActivesMove into MOE with pK(x) or pIC50 values:Wash and calculate physical properties:Hydrogen bond acceptors (Acc)Hydrogen bond donors (Don)Total number for combined Acc and DonHeavy atom countRotatable bond countSlogPTPSAWeightTag each entry as to seriesSelect using MOE diverse selection ~40 for each series based on pK(x) or pIC50 and Acc and DonCheck spread in other characteristics to be sure they are not skewed and by eye verify a spread in available chemical functionality.Dataset Selection Process (2)Slide17

For the Actives (identify previous release of compound data)Extract the ligands for the target from BindingDB/ChEMBL and load into MOE then export smiles.Export the selected set from MOE (all fields) into text with structure as smiles.In Pipeline Pilot –using canonicalized smiles – check to see if any selected is in BindingDB. If yes – select suitable replacement – if not then selection stands. Dataset Selection Process (3)Slide18

Find the Inactives Many should be extremely similar to crystal structureUsing Pipeline Pilot search the inactives with the smiles from the crystal structure to find those very similar to known crystal structure.MDLpublic keys with 0.85 to 0.99 as range.Select ~10 If ~10 are found then check BindingDB (Pipeline Pilot) for any that are in literature. If yes – select suitable replacement – else selection standsIf only 1 or 2 (or none) then continue in MOEDataset Selection Process (4)Slide19

For the Inactives not extremely similar to crystal structureMove into MOE :Wash and calculate physical properties:Hydrogen bond acceptors (Acc)Hydrogen bond donors (Don)Total number for combined Acc and DonHeavy atom countRotatable bond countSlogPTPSAWeightTag each entry as to seriesSelect using MOE diverse selection ~10 for each series based on Acc and DonCheck spread in other characteristics to be sure they are not skewed and by eye verify a spread in available chemical functionality.Check BindingDB (Pipeline Pilot) for any that are in literature.

Dataset Selection Process (5)Slide20

Biological dataAttention to the details – LpxC – just enough Zn to be active (catalytic site), but not enough to cause inhibition from secondary inhibitory site for ZnNeed to be aware of inherent error limitsSolubility can be a big issue Particularly how it is handledi.e. filtered solids from ligand before injecting into ITCProtocols – did they use exactly what was publishedStore output from assays in PDF – spectra, etc.Allow end users to see and judge what they want to include for themselvesCrystallography – check the quality, provide densityMany different metrics – for us RSCC (real space correlation coefficient) for ligand is very important – but we use severalSetting up lots of proteins for docking and scoring can be a bearGetting approval of legal departments – very time consuming

Initial confidentiality agreement

Approval of individual compounds for release

Datasets – lessons learnedSlide21

Thank you and any comments or questions