/
OpenBabel:Anopenchemicaltoolbox OpenBabel:Anopenchemicaltoolbox

OpenBabel:Anopenchemicaltoolbox - PDF document

natalia-silvester
natalia-silvester . @natalia-silvester
Follow
381 views
Uploaded On 2016-03-01

OpenBabel:Anopenchemicaltoolbox - PPT Presentation

SOFTWARE OpenAccess NoelMO ID: 237923

SOFTWARE OpenAccess NoelMO

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "OpenBabel:Anopenchemicaltoolbox" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

SOFTWARE OpenAccess OpenBabel:Anopenchemicaltoolbox NoelMO ’ Boyle 1 ,MichaelBanck 2 ,CraigAJames 3 ,ChrisMorley 4 ,TimVandermeersch 4 andGeoffreyRHutchison 5* Abstract Background: Afrequentproblemincomputationalmodelingistheinterconversionofchemicalstructures betweendifferentformats.Whilestandardinterchangeformatsexist(forexample,ChemicalMarkupLanguage)and defacto problemduetothemultitudeofdifferentapplicationareasforchemistrydata,differencesinthedatastoredby differentformats(0Dversus3D,forexample),andcompetitionbetweensoftwarealongwithalackofvendor- neutralformats. Results: Wediscuss,forthefirsttime,OpenBabel,anopen-sourcechemicaltoolboxthatspeaksthemany languagesofchemicaldata.OpenBabelversion2.3interconvertsover110formats.Theneedtorepresentsucha widevarietyofchemicalandmoleculardatarequiresalibrarythatimplementsawiderangeofcheminformatics algorithms,frompartialchargeassignmentandaromaticitydetection,tobondorderperceptionand canonicalization.WedetailtheimplementationofOpenBabel,describekeyadvancesinthe2.3release,and outlineavarietyofusesbothintermsofsoftwareproductsandscientificresearch,includingapplicationsfar beyondsimpleformatinterconversion. Conclusions: OpenBabelpresentsasolutiontotheproliferationofmultiplechemicalfileformats.Inaddition,it providesavarietyofusefulutilitiesfromconformersearchingand2Ddepiction,tofiltering,batchconversion,and substructureandsimilaritysearching.Fordevelopers,itcanbeusedasaprogramminglibrarytohandlechemical datainareassuchasorganicchemistry,drugdesign,materialsscience,andcomputationalchemistry.Itisfreely availableunderanopen-sourcelicensefromhttp://openbabel.org. Introduction Thehistoryofchemicalinformaticshasincludedahuge varietyoftextualandcomputerrepresentationsofmole- culardata.Suchrepresentationsfocusonspecificatomic ormolecularinformationandmaynotattempttostore allpossiblechemicaldata.Forexample,linenotations likeDaylightSMILES[1]donotoffercoordinateinfor- mation,whilecrystallographicorquantummechanical formatsfrequentlydonotstorechemicalbondingdata. Hydrogenatomsarefrequentlyomittedfromx-raycrys- nates,andareoftenignoredbysomefileformatsasthe “ implicitvalence ” ofheavyatomsthatindicatestheir presence.Othertypesofrepresentationsrequirespecifi- cationofatomtypesonthebasisofaspecificvalence bondmodel,inclusionofcomputedpartialcharges, indicationofbiomolecul arresidues,ormultiple conformations. Whileattemptshavebeenmadetoprovideastandard formatforstoringchemicaldata,includingmostnotably thedevelopmentofChemicalMarkupLanguage(CML) [2-6],anXMLdialect,suchformatshavenotyet achievedwidespreaduse.Consequently,afrequentpro- blemincomputationalmodelingistheinterconversion ofmolecularstructuresbetweendifferentformats,apro- cessthatinvolvesextraction andinterpretationoftheir chemicaldataandsemantics. Weoutlineforthefirsttime,thedevelopmentanduse oftheOpenBabelproject,afull-featuredopenchemical toolbox,designedto “ speak ” themanydifferentrepre- sentationsofchemicaldata.Itallowsanyonetosearch, convert,analyze,orstoredatafrommolecularmodeling, chemistry,solid-statematerials,biochemistry,orrelated areas.Itprovidesbothready-to-useprogramsaswellas acomplete,extensibleprogrammer ’ stoolkitfordevelop- ingcheminformaticssoftware.Itcanhandlereading, *Correspondence:geoffh@pitt.edu 5 UniversityofPittsburgh,DepartmentofChemistry,219ParkmanAvenue, Pittsburgh,PA15217,USA Fulllistofauthorinformationisavailableattheendofthearticle O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 ©2011O ’ Boyleetal;licenseeChemistryCentralLtd.ThisisanOpenAccessarticledistributedunderthetermsoftheCreative CommonsAttributionLicense(http://creativecommons.org/licenses/by/2.0),whichpermitsunrestricteduse,distribution,and reproductioninanymedium,providedtheoriginalworkisproperlycited. writing,andinterconvertingover110chemicalfilefor-mats,supportsfilteringandsearchingmoleculefilesusingDaylightSMARTSpatternmatching[7]andothermethods,andprovidesextensiblefingerprintingandmolecularmechanicsframeworks.Wewilldiscusstheframeworksforfileformatinterconversion,fingerprint-ing,fastmolecularsearching,bondperceptionandatomtyping,canonicalnumberingofmolecularstructuresandfragments,molecularmechanicsforcefields,andtheextensibleinterfacesprovidedbythesoftwarelibrarytoenablefurtherchemistrysoftwaredevelopment.OpenBabelhasitsorigininaversionofOELibreleasedasopen-sourcesoftwarebyOpenEyeScientificundertheGPL(GNUPublicLicense).In2001,OpenEyedecidedtorewriteOELibin-houseastheproprietaryOEChemlibrary,sotheexistingcodefromOELibwasspunoutintothenewOpenBabelproject.Since2001,OpenBabelhasbeendevelopedandsubstantiallyextendedasaninternationalcollaborativeprojectusinganopen-sourcedevelopmentmodel[8].Ithasover160,000downloads,over400citations[9],isusedbyover40softwareprojects[10],andisfreelyavailablefromtheOpenBabelwebsite[11].FeaturesFileFormatSupportWiththereleaseofOpenBabel2.3,OpenBabelsup-ports111chemicalfileformatsintotal.Itcanread82formatsandwrite85formats.Theseencompasscom-monformatsusedincheminformatics(SMILES,InChI,MOL,MOL2),inputandoutputfilesfromavarietyofcomputationalchemistrypackages(GAMESS,Gaussian,MOPAC),crystallographicfileformats(CIF,ShelX),reactionformats(MDLRXN),fileformatsusedbymoleculardynamicsanddockingpackages(AutoDock,Amber),formatsusedby2Ddrawingpackages(Chem-Draw),3Dviewers(Chem3D,Molden)andchemicalkineticsandthermodynamics(ChemKin,Thermo).For-matsareimplementedasinOpenBabel,whichmakesiteasyforuserstocontributenewfilefor-mats(seeExtensibleInterfacebelow).Dependingontheformat,otherdataisextractedbyOpenBabelinaddi-tiontothemolecularstructure;forexample,vibrationalfrequenciesareextractedfromcomputationalchemistrylogfiles,unitcellinformationisextractedfromCIFfiles,andpropertyfieldsarereadfromSDFfiles.Anumberofutilityfileformatsarealsodefined;thesearenotstrictlyspeakingawayofstoringthemolecularstructure,butratherpresentcertainfunction-alitythroughthesameinterfaceastheregularfilefor-mats.Forexample,thereportformatisawrite-onlyutilityformat[12]thatpresentsasummaryofthemole-cularstructureofamolecule;thefingerprintformattandfastsearchformat[14]areusedforsimilarityandsubstructuresearching(seebelow);theMolPrint2DandMultilevelNeighborhoodsofAtomsformatscalculatecir-cularfingerprintsdefinedbyBenderetal.[15,16]andetal.[17,18]respectively.Eachformatcanhavemultipleoptionstocontroleitherreadingorwritingaparticularformat.Forexam-ple,theInChIformathas12optionsincludinganoptiontogenerateanInChIKey,Tparam㚐truncatetheInChIdependingonasuppliedparameterandtoignorecertainInChIwarnings.Theavailableoptionsarelistedinthedocumentation,areshownintheGraphicalUserInterface(GUI)ascheckboxesortextboxes,andcanbelistedatthecommand-line.Infact,allthreearegeneratedfromthesamesource;adocumentationstringintheC++code.FingerprintsandFastSearchingDatabasesarewidelyusedtostorechemicalinformationespeciallyinthepharmaceuticalindustry.Akeyrequire-mentofsuchadatabaseistheabilitytoindexchemicalstructuressothattheycanbequicklyretrievedgivenaquerysubstructure.OpenBabelprovidesthisfunctional-ityusingapath-basedfingerprint.Thisfingerprint,referredtoasFP2inOpenBabel,identifiesalllinearandringsubstructuresinthemoleculeoflengths1to7(excludingthe1-atomsubstructuresCandN)andmapsthemontoabit-stringoflength1024usingahashfunc-tion.Ifaquerymoleculeisasubstructureofatargetmolecule,thenallofthebitssetinthequerymoleculewillalsobesetinthetargetmolecule.Thefingerprintsfortwomoleculescanalsobeusedtocalculatestruc-turalsimilarityusingtheTanimotocoefficient,thenum-berofbitsincommondividedbytheunionofthebitsClearly,repeatedsearchingofthesamesetofmole-culeswillinvolverepeateduseofthesamesetoffinger-prints.Toavoidtheneedtorecalculatethefingerprintsforaparticularmulti-moleculefile(suchasanSDFfile),OpenBabelprovidesaformatthatsolelystoresafingerprintalongwithanindexintotheoriginalfile.Thisindexleadstoarapidincreaseinthespeedofsearchingformatchestoaquery-datasetswithseveralmillionmoleculesareeasilysearchedinteractively.Inthisway,amulti-moleculefilemaybeusedasalight-weightalternativetoachemicaldatabasesystem.BondPerceptionandAtomTypingAsmentionedabove,manychemicalfileformatsofferrepresentationsofmoleculardatasolelyaslistsofatoms.Forexample,mostquantumchemicalsoftwarepackagesandmostcrystallographicfileformatsdonotofferdefinitionsofbonding.AsimilarsituationoccursinthecaseoftheProteinDataBank(PDB)format;whilestandardized[19]filescontainconnectivityetalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page2of14 information,non-standardfilesexistthatoftendonotprovidefullconnectivityinformation.Consequently,OpenBabelfeaturesmethodstodeterminebondcon-nectivity,bondorderperception,aromaticitydetermina-tion,andatomtyping.Bondconnectivityisdeterminedbythefrequentlyusedalgorithmofdetectingatomscloserthanthesumoftheircovalentradii,withaslighttolerance(0.45Å)toallowforlongerthantypicalbonds.Tohandledisorderincrystallographicdata(e.g.,PDBorCIFfiles),atomscloserthan0.63Åarenotbonded.Afurtherfilteringpassismadetoensurestandardbondvalencyismain-tained;eachelementhasamaximumnumberofbonds,ifthisisexceededthenthelongestbondstoanatomaresuccessivelyremoveduntilthevalenceruleisAfterbondconnectivityisdetermined,ifneededorrequestedbytheuser,bondorderperceptionisper-formedonthebasisofbondanglesandgeometries.ThemethodissimilartothatproposedbyRogerSayle[20]andusestheaveragebondanglearoundanun-typedatomtodeterminespandsphybridizedcenters.5-memberedand6-memberedringsarecheckedforpla-naritytoestimatearomaticity.Finally,atomsmarkedasunsaturatedarecheckedforanunsaturatedneighbortogiveadoubleortriplebond.Afterthisinitialatomtyp-ing,knownfunctionalgroupsarematched,followedbyaromaticrings,followedbyremainingunsatisfiedbondsbasedonasetofheuristicsforshortbonds,atomicelec-tronegativity,andringmembership.Atomtypingisperformedbylazyevaluation,match-ingatomsagainstSMARTSpatternstodeterminehybri-dization,implicitvalence,andexternalatomtypes.Atomtypeperceptionmaybetriggeredbyaddinghydrogens(whichrequiresdeterminationofimplicitandexplicitvalence),exportingtoafileformatthatrequiresatomtypes,orasrequestedbytheuser.Tominimizetheamountoftypingrequired,whenimportingfromaformatwithatomtypesspecified,alookuptableisusedtotranslatebetweenequivalenttypes.Animportantpartofatomtypingisaromaticitydetec-tionandassignmentofKekulébondorders(kekuliza-tion).InOpenBabel,acentralaromaticitymodelisused,largelymatchingthecommonlyusedDaylightSMILESrepresentation[1],butwithaddedsupportforaromaticphosphorousandselenium.Potentialaromaticatomsandbondsareflaggedonthebasisofmember-shipinaringsystempossiblycontaining4n+2elec-trons.Aromaticityisestablishedonlyifawell-definedvalencebondKekulépatterncanbedetermined.Todothis,atomsareaddedtoaringsystemandcheckedagainstthe4n+2electronconfiguration,graduallyincreasingthesizetoestablishthelargestpossiblecon-nectedaromaticringsystem.Oncethisringsystemisdetermined,anexhaustivesearchisperformedtoassignsingleanddoublebondstosatisfyallvalencesinaKekuléform.Sincethisprocessisexponentialincom-plexity,thealgorithmwillterminateifmorethan30levelsofrecursionor15secondsareexceeded(whichmayoccurinthecaseoflargefusedringsystemssuchascarbonnanotubes).CanonicalRepresentationofMoleculesIngeneral,foranyparticularmolecularstructureandfileformat,therearealargenumberofpossiblewaysthestructurecouldbestored;forexample,thereareN!waysoforderingtheatomsinanMOLfile.Whileeachoftheorderingsencodesexactlythesameinformation,itcanbeusefultodefineacanonicalnumberingoftheatomsofamoleculeandusethistoderiveacanonicalrepresentationofamoleculeforaparticularfileformat.Forazero-dimensionalfileformatwithoutcoordinates,suchasSMILES,thecanonicalrepresentationcouldbeusedtoindexadatabase,removeduplicatesorsearchformatches.OpenBabelimplementsasophisticatedcanonicaliza-tionalgorithmthatcanhandlemoleculesormolecularfragments.Theatomsymmetryclassesaretheinitialgraphinvariantsandencodetopologicalandchemicalproperties.Acooperativelabelingprocedureisusedtoinvestigatetheautomorphicpermutationstofindthecanonicalcode.AlthoughthealgorithmissimilartotheoriginalMorgancanonicalcode[21],variousimprove-mentsareimplementedtoimproveperformance.Mostnotably,thealgorithmimplementsheuristicsfromthepopularnautypackage[22,23].Anotheraspecthandledbythecanonicalcodeisstereochemistryasdifferentlabelingscanleadtodifferentparities.Thisisfurthercomplicatedbythepossibilityofsymmetry-equivalentstereocentersandstereocenterswhoseconfigurationisinterdependent.Thefulldetailswillbethesubjectofaseparatepublication.CoordinateGenerationin2Dand3DOpenBabel,version2.3,hassupportfor2Dcoordinategeneration(Figure1)throughthedonationofcodebySergeiTrepalin,basedonthecodeusedintheMCDLchemicalstructureeditor[24-26].TheMCDLalgorithmaimstolayoutthemolecularstructurein2Dsuchthatallbondlengthsareequalandallbondanglesarecloseto120°.Thelayoutalgorithmincludesasmalldatabaseofaround150templatestohelplayoutcagesandlargefragmentcycles.Todealwiththeproblemofoverlap-pingfragments,thealgorithmincludesanexhaustivesearchprocedurethatrotatesaroundacyclicbondsbyCoordinategenerationin3DwasintroducedinOpenBabelversion2.2,andimprovedinversion2.3,toenableetalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page3of14 conversionfrom0DformatssuchasSMILESto3Dfor- matssuchasSDF(Figure1).The3Dstructuregenera- torbuildslinearcomponentsfromscratchfollowing geometricalrulesbasedonthehybridizationofthe atoms.Single-conformerringtemplatesareusedfor ringsystems.Thetemplatematchingalgorithmiterates throughthetemplatesfromlargesttosmallestsearching formatches.Ifamatchisfound,thealgorithmcon- tinuesbutwillnotmatchanyringatomspreviously templatedexceptinthecaseofasingleoverlap(thetwo ringsystemsofaspirogroup)oranoverlapinvolving exactlytwoadjacentatoms(twofusedringsystems). Afteraninitialstructureisgenerated,thestereochemis- try(cis/transandtetrahedral)iscorrectedtomatchthe inputstructure.Finally,theenergyofthestructureis minimizedusingtheMMFF94forcefield[27-31]anda lowenergyconformerfoundusingaweightedrotor search. Whilethe3Dstructurebuilderproducesreasonable conformationsformoleculeswithoutringsorwithring systemsforwhichatemplateexists,theresultsmaybe poorformoleculeswithmorecomplexringsystemsor organometallicspecies.Futureworkwillbeperformed tocomparetheresultsofOpenBabelwithotherpro- gramswithrespecttobothspeedandthequalityofthe generatedstructures[32]. Stereochemistry ArecentfocusofOpenBabeldevelopmenthasbeento ensurerobusttranslationofstereochemicalinformation betweenfileformats.Thisisparticularlyimportant whendealingwith0Dformatsastheseexplicitlyencode theperceivedstereochemistry.OpenBabel2.3includes classestohandlecis/transdoublebondstereochemistry, tetrahedralstereochemistryandsquare-planarstereo- chemistry(thislastisstillunderdevelopment),aswell asperceptionroutinesfor2Dand3Dgeometries,and routinestoqueryandalterthestereochemistry. Thedetectionofstereogenicunitsstartswithanana- lysisofthegraphsymmetryofthemoleculetoidentify thesymmetryclassofeachatom.However,giventhata completesymmetryanalysisalsoneedstotakestereo- chemistryintoaccount,thismeansthattheoverall stereochemistrycanonlybefounditeratively.Ateach iteration,thecurrentatomsymmetryclassesareusedto identifystereogenicunits.Forexample,atetrahedral centerisidentifiedaschiralifithasfourneighborswith differentsymmetryclasses(orthree,inthecasewherea lonepairgivesrisetothetetrahedralshape). Forcefields Molecularmechanicsfunctionsareprovidedforuse withsmallmolecules.Typicalapplicationsinclude energyevaluationorminimization,aloneoraspartofa largerworkflow.Theselectionofimplementedforce fieldsallowsmostmolecularstructurestobeusedand parameterstobeassignedautomatically.TheMMFF94 (s)forcefieldcanbeusedfororganicordrug-likemole- cules[27-31].Formoleculescontaininganyelementof theperiodictableorcomplexgeometry(i.e.notsup- portedbyMMFF94),theUFFforcefieldcanbeused instead[33].Recently,codeimplementingtheGAFF forcefield[34,35]wasalsocontributedandreleasedas partofversion2.3.Alloftheforcefieldsallowtheappli- cationofconstraintsonparticularatompositions,or particulardistances. Severalconformersearchingmethodshavebeen implementedusingtheforcefields,allbasedonthe “ tor- sion-driving ” approach.Thisapproachinvolvessetting torsionanglesfromasetofpredefinedallowedvalues foraparticularrotatablebond.Themostthorough searchmethodimplementedisasystematicsearch method,whichiteratesoveralloftheallowedtorsion anglesforeachrotatablebondinthemoleculeand retainstheconformerwiththelowestenergy.Sincea systematicsearchmaynotbefeasibleforamolecule withmultiplerotatablebonds,anumberofstochastic searchmethodsarealsoavailable:therandomsearch method,whichtriesrandomsettingsforthetorsion angles(fromthepredefinedallowedvalues),anda weightedrotorsearch,astoc hasticsearchmethodthat convergesonalowenergyconformerbyweightingpar- ticulartorsionanglesbasedontherelativeenergyofthe generatedconformer.WithOpenBabel2.3,conformer searchbasedonageneticalgorithmisalsoavailable whichallowstheapplicationoffilters(e.g.adiversityfil- ter)anddifferentscoringfunctions.Thislattermethod canbeusedtogeneratealibraryofdiverseconformers, Figure1 Interconversionof0D,2Dand3Dstructures .The structuresshownareofsertraline,aselectiveserotoninreuptake inhibitor(SSRI)usedinthetreatmentofdepression.ASMILESstring forsertralineisshownatthetop;thiscanbeconsidereda0D structure(onlyconnectivityandstereochemicalinformation).From this,OpenBabelcangeneratea2Dstructure(bottomleft,depicted byOpenBabel)ora3Dstructure(bottomright,depictedby Avogadro),andallofthesecanbeinterconverted. O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page4of14 orliketheothermethodstoseekalowenergyconfor- mer[36]. Implementation TechnicalDetails OpenBabelisimplementedinstandards-compliantC+ +.ThisensuressupportforawidevarietyofC++com- pilers(MSVC,GCC,IntelCompiler,MinGW,Clang), operatingsystems(Windows,MacOSX,Linux,BSD, Windows/Cygwin)andplatforms(32-bit,64-bit).Since version2.3,itiscompiledusingtheCMakebuildsystem [37,38].Thisisanopen-sourcecross-platformbuildsys- temwithadvancedfeaturesfordependencyanalysis. Thebuildsystemhasanassociatedunittestframework CTest,whichallowsnightlybuildstobecompiledand testedautomaticallywiththeresultscollatedanddis- playedonacentralizeddashboard[39]. TosimplifyinstallationOpenBabelhasasfewexter- naldependenciesaspossible.Wheresuchdependencies exist,theyareoptional.Forexample,iftheXMLdevel- opmentlibrariesarenotavailable,OpenBabelwillstill compilesuccessfullybutnoneoftheXMLformats (suchasChemicalMarkupLanguage,CML)willbe available.Similarly,ifthe Eigenmatrixandlinearalge- bralibraryisnotfound,any classesthatrequirefast matrixmanipulation(suchasOBAlign,whichperforms leastsquaresalignment)willnotbecompiled. WhilethemajorityoftheOpenBabellibraryiswrit- teninC++,bindingshavebeendevelopedforarangeof otherprogramminglanguages,includingJavaandthe. NETplatform,aswellastheso-called “ dynamic ” script- inglanguagesPerl,Python,andRuby.Theseareauto- maticallygeneratedfromtheC++headerfilesusingthe SWIGtool.Asdescribedpreviously[40],inthecaseof PythonanadditionalmoduleisprovidednamedPybel thatsimplifiesaccesstotheC++bindings.Theseinter- facesfacilitatedevelopmentofweb-enabledchemistry applications,aswellasrapiddevelopmentand prototyping. CodeArchitecture TheOpenBabelcodebasehasamodulardesignas showninFigure2.Thegoalofthisdesignisthreefold: 1.Toseparatethechemistry,theconversionprocess andtheuserinterfacesreducing,asfaraspossible, thedependencyofoneuponanother. 2.Toputallofthecodeforeachchemicalformatin oneplace(usuallyasinglefile)andmaketheaddi- tionofnewformatssimple. 3.Toallowtheformatconversionofnotjustmole- cules,butalsoanyotherc hemicalobjects,suchas reactions. Thecodebasecanbeconsideredasconsistingofthe followingmodules(Figure2):  TheChemicalCore,whichcontainsOBMoletc. andhasallofthechemicalstructuredescriptionand manipulation.Thisistheheartoftheapplication anditsAPIcanbeusedasachemicaltoolbox.It hasnoinput/outputcapabilities.  TheFormats,whichreadandwritetofilesofdif- ferenttypes.Theseclassesarederivedfromacom- monbaseclass,OBFormat,whichisinthe ConversionControlmodule.Theyalsomakeuseof thechemicalroutinesintheChemicalCoremodule. Eachformatfilecontainsaglobalobjectofthefor- matclass.Whentheformatisloadedtheclasscon- structorregistersthepresenceoftheclasswith OBConversion.Thismeansthattheformatsareplu- gins-newformatscanbeaddedwithoutchanging anyframeworkcode.  CommonFormatsincludeOBMoleculeFormatand XMLBaseFormatfromwhichmostotherformats (likeFormatAandFormatBinthediagram)are derived.IndependentformatslikeFormatCarealso possible.  TheConversionControl,whichalsokeepstrackof theavailableformats,theconversionoptionsandthe inputandoutputstreams.Itcanbecompiledwith- outreferencetoanyotherpartsoftheprogram.In particular,itknowsnothingoftheChemicalCore: mol.hisnotincluded.  TheUserInterface,whichmaybeacommandline application,aGraphicalUserInterface(GUI),or maybepartofanotherprogramthatusesOpen Babel ’ sinputandoutputfacilities.Thisdependsonly Figure2 ArchitectureoftheOpenBabelcodebase . O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page5of14 ontheConversionControlmodule(obconversion.h isincluded),butnotontheChemicalCoreoron anyoftheFormats.  TheFingerprintAPI,aswellasbeingusablein externalprograms,isemployedbythefastsearchand fingerprintformats.  TheFingerprints,whicharebitarraysthatdescribe anobjectandwhichfacilitatefastsearching.They arealsobuiltasplugins,registeringthemselveswith theirbaseclassOBFingerprintwhichisintheFin- gerprintAPI.  OtherfeaturessuchasForcefields,PartialCharge ModelsandChemicalDescriptors,althoughnot showninthediagram,arehandledsimilarlyto Fingerprints.  TheErrorHandlingcanbeusedthroughoutthe programtologanddisplayerrorsandwarnings. ExtensibleInterface TheutilityofsoftwarelibrariessuchasOpenBabel dependsontheabilityofthedesigntobeextendedover timetosupportnewfunction ality.Tofacilitatethis, OpenBabelimplementsa plugininterface forfilefor- mats,fingerprints,char gemodels,descriptors, “ opera- tors ” andmolecularmechanicsforcefields.Thisensures acleanseparationoftheimplementationofaparticular pluginfromthecoreOpenBabellibrarycode,and makesiteasyforanewplugin(e.g.anewfileformat)to becontributed;allthatisneededisasingleC++fileand atrivialchangetooneofthebuildfiles.Theoperator pluginsprovideaverygeneralmechanismforoperating onamolecule(e.g.energyminimizationor3Dcoordi- nategeneration)oronalistofmolecules(e.g.filtering orsorting)afterreadingbutbeforewriting. Pluginsaredynamicallyl oadedatruntime.This decreasestheoveralldiskandmemoryfootprintof OpenBabel,allowingexternaldeveloperstochoosepar- ticularfunctionalityneededfortheirapplicationand ignoreother,lessrelevantfeatures.Italsoallowsthe possibilityofathird-partydistributingpluginsseparately totheOpenBabeldistributiontoprovideadditional functionality. Open-SourceLicenseandOpenDevelopment OpenBabelisopen-sourcesoftware,whichoffersend usersandthird-partydevelopersarangeofadditional rightsnotgrantedbyproprietarychemistrysoftware. Open-sourcesoftware,atitsmostbasiclevel,grants userstherightstostudyhowtheirsoftwareworks,to adaptitforanypurposeorotherwisemodifyit,andto sharethesoftwareandtheirmodificationswithothers. Inthissense,OpenSourcefunctionsinsimilarwaysto theprocessesofopenpeerreview,publication,and citationinscience.Therightsgrantedbyopensource licenseslargelycoincidewiththenormsofscientific ethicstoenableverifiability,repeatability,andbuilding onpreviousresultsandtheories. Beyondtheserights,OpenBabel(likemostother open-sourceprojects)offersopendevelopment – thatis, alldevelopmentoccursinpublicforumsandwithpublic coderepositories.Thisresultsingreaterinputfromthe communityasanyusercaneasilysubmitbugreportsor featuresuggestions,getin volvedindiscussionsonthe futuredirectionofOpenBabelorevenbecomeadevel- operhim/herself.Inpractice,thenumberofactivecon- tributorshasincreasedovertimethroughthislevelof open,publicdevelopment(Figure3).Moreover,it meansthatthedevelopmentofthecodeiscompletely transparentandthequalityofthesoftwareisavailable forpublicscrutiny.Indeed,sinceitsinception,over658 bugshavebeensubmittedtothepublictrackerand fixed[41]. ValidationandTesting OpenBabelincludesanextensivetestsuitecomprising 60differenttestprogramseachwithtenstohundredsof tests.Inearly2010,anight lybuildinfrastructureand dashboardwasputinplacewithsupportfromKitware, Inc.Thishasgreatlyimprovedcodequalitybycatching regressions,andalsoensuresthatthecodecompiles cleanlyonallplatformsandcompilerssupportedby OpenBabel.Someexamplesofteststhatareruneach nightare: (1)TheMMFF94forcefieldcodeistestedagainstthe MMFF94validationsuite. Figure3 Numberofcontributorsovertime .Notethatthisgraph onlyincludesdeveloperswhodirectlycommitedcodetotheOpen Babelsourcecoderepository,anddoesnotincludepatches providedbyusers. O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page6of14 (2)TheOBAlignclass,whichwasdevelopedusing Test-DrivenDevelopment(TDD)methodology,is runagainstitstestsuite. (3)Handlingofsymmetryisvalidatedbyconverting severaltestcasesbetweenSMILES,2Dand3DSDF, andInChI(therearealsoseveraltestprogramswith unittestsfortheindividualstereoclassesinthe API). (4)TheSMARTSparseristestedusingover250 validandinvalidSMARTSpatterns,andthe SMARTSmatcheristestedusing125basic SMARTSpatterns. (5)TheLSSR(LeastSetofSmallestRings)codeis testedforinvarianceagainstchangingtheatom orderforaseriesofpolycyclicmolecules. Recentlythedevelopmentteamhasplacedamajor focusonincreasingtherobustnessoffileformattransla- tionparticularlyinrelationtothecommonlyused SMILESandMDLMolfileformats.Translatingbetween theseformatsrequiresaccuratestereochemistrypercep- tion,inferenceofimplicithydrogens,andkekulizationof delocalizedsystems.Whileitisdifficulttoensurethat anycomplexpieceofcodeisfreeofbugs,andOpen Babelisnoexception,validationprocedurescanbecar- riedouttoassessthecurrentlevelofperformanceand tofindadditionaltestcasesthatexposebugs.Thefol- lowingprocedurewasusedtoguidetherewritingof stereochemistrycodeinOpenBabel,aprojectthat beganinearly2009.Startingwithadatasetof18,084 3DstructuresfromPubChem3DasanSDFfile,we comparedtheresultof(a)conversiontoSMILES,fol- lowedbyconversionofthattoCanonicalSMILESto(b) conversiondirectlytoCanonicalSMILES.Thisproce- durecanbeusedtoflushouterrorsinreadingtheori- ginalSDFfile,reading/writingSMILES(eitherdueto stereochemistryerrorsorkekulizationproblems),andis alsoatest(tosomeextent)ofthecanonicalizationcode. Atthetimeofstartingthiswork(March2009),the errorratefoundwas1424(8%);byOct2009,combined workonstereochemistry,kekulizationandcanonicaliza- tionhadreducedthisto190(~1%),andcontinued improvementshavereducedthenumberoferrorsdown totwo(showninFigure4)forOpenBabel2.3.1 (~0.01%).Thefirstfailureisduetoakekulizationerror inapolycyclicaromaticmoleculeincorporatingheteroa- toms:(a)gavec1ccc2c(c1)c1[nH][nH]c3c4c1c(c2) ccc4cc1c3cccc1while(b)gavec1ccc2c(c1)c1nnc3c4c1c (c2)ccc4cc1c3cccc1.Thiserrorledtoconfusionover whetherornotthearomaticnitrogenshavehydrogens attached(theydonot).Thesecondfailureinvolvescon- fusionoverthecanonicalstereochemistryatabridge- headcarbon:(a)gaveC1CN2[C@@H](C1)CCC2while (b)gaveC1CN2[C@H](C1)CCC2.Thisisactuallya mesocompoundandsobothSMILESstringsarecor- rectandrepresentthesamemolecule.Howeverthe canonicalizationalgorithmshouldhavechosenone stereochemistryortheotherforthecanonical representation. Anotherareaoffocuswast hecanonicalizationalgo- rithm,whichcanbeusedtogeneratecanonicalSMILES aswellasotherformats.Thealgorithmcanbetestedby ensuringthatthesamecanonicalSMILESstringis obtainedevenwhentheorderofatomsinamoleculeis changed(whileretainingthesameconnectiontable). Theteststressesallareasofthelibrary,includingaro- maticityperception,kekuliz ation,stereochemistry,and canonicalization.Thedeve lopmentofthecanonicaliza- tioncodeinOpenBabelwasguidedbyapplyingthis testtothe5,151,179moleculesintheeMoleculescatalo- gue(dated2011-01-02)with10randomshufflesofthe atomorder.AtthetimeoftheOpenBabel2.2.3release, therewere24,404failuresofthecanonicalizationalgo- rithm;thishasnowbeenreducedtoonlyfour(shown inFigure5,0.001%).TheOpenBabelnightlytest suiteensuresthatthistestpassesforanumberofpro- blematicmolecules.Althoughthecanonicalizationalgo- rithmisstillnotperfect,webelievethatthecurrent levelofperformance(99.99992%successontheeMole- culescatalogue)isacceptableforgeneraluseandwith timeweintendtoimproveperformancefurther. Giventhattheerrorrateforcanonicalizationand handlingofstereochemistryisnowquitelow,thenext areaoffocusfortheOpenBabeldevelopmentteamis toimprovethehandlingofimplicitvalencefor “ unusual atoms. ” Thisisparticularlyimportantfororganometallic speciesandinorganiccomplexes. UsingOpenBabel Applications TheOpenBabelpackageiscomposedofasetofuser applicationsaswellasaprogramminglibrary.Themain commandlineapplicationprovidedis obabel (asmall upgradeontheearlier babel ),whichfacilitatesfilefor- matconversion,filtering(bySMARTS,title,descriptor value,orpropertyfield),3Dor2Dstructuregeneration, Figure4 Thetwofailuresfoundinthevalidationtestfor reading/writingSMILES . O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page7of14 conversionofhydrogensfro mimplicittoexplicit(and viceversa),andremovalofsmallfragmentsorofdupli- catestructures.Anumberoffeaturesareprovidedto handlemulti-moleculef ileformats(suchasSDFor MOL2)andtouseormanipulatetheinformationin propertyfieldsandmoleculetitles.Hereisanexample ofusing obabel toconvertfromSDFformattoSMILES: obabelinputmols.sdf-Ooutputmols.smi Amorecomplicatedusewouldbetoextractallmole- culesinanSDFfilewhosetitlesstartwith “ active": obabelinputmols.sdf-aT-ocopy-Oout- putmols.sdf – filter “ title= ’ active* ’” The copy formatspecifiedby “ -ocopy ” isautilityfor- matthatcopiestheexactcontentsoftheinputfile(for thefilteredmolecules)directlytotheoutput,without perceptionorinterpretation.The “ -aT ” indicatesthat onlythetitleoftheinputSDFfileshouldberead;full chemicalperceptionisnotrequired. TheOpenBabelgraphicaluserinterface(GUI)pro- videsthesamefunctionality.Figure6isascreenshotof theGUIcarryingoutthesamefilteringoperation describedinthe obabel exampleabove.Theleftpanel dealswithsettinguptheinputfile,therightpanelhan- dlestheoutputandthecentralpanelisforsettingcon- versionoptions.Dependingonwhetheraparticular optionrequiresaparameter,theavailableoptionsare displayedeitherascheckboxesorastextentryboxes. Theseinterfaceelements aregenerateddynamically directlyfromthetextdescriptionandhelptextprovided byeachformatplugin. ProgrammingLibrary TheOpenBabellibraryallowsuserstowritechemistry applicationswithoutworryingaboutthelow-leveldetails ofhandlingchemicalinformation,suchashowtoread orwriteaparticularfileformat,orhowtouseSMARTS forsubstructuresearching.Instead,theusercanfocus onthescientificproblemathand,oroncreatingamore easy-to-useinterface(e.g.aGUI)tosomeofOpen Babel ’ sfunctionality.TheOpenBabelAPI(Application ProgrammingInterface)isthesetofclasses,methods andvariablesprovidedbyOpenBabeltotheuserfor useinprograms.DocumentationonthecompleteAPI (generatedusingDoxygen[42])isavailablefromthe OpenBabelwebsite[43],orcanbegeneratedfromthe sourcecode. ThefunctionalityprovidedbytheOpenBabellibrary isrelieduponbymanyusersandbyseveralothersoft- wareprojects,withtheresultthatintroducingchanges totheAPIwouldcauseexistingsoftwaretobreak.For thisreason,OpenBabelstrivestomaintainAPIstabi- lityoverlongperiodsoftime,sothatexistingsoftware willcontinuetoworkdespitethereleaseofnewOpen Babelversionswithadditionalfeatures,fileformats andbugfixes.OpenBabelusesaversionnumbering systemthatindicateshowtheAPIhaschangedwith everyrelease:  Bugfixreleases(e.g.2.0.0versus2.0.1)donot changeAPIatall  Minorversionreleases(e.g.2.0versus2.1)willadd totheAPI,butwillotherwisebebackwards- compatible  Majorversionreleases(e.g.2versus3)arenot backwards-compatible,andhavechangestotheAPI (includingremovalofdeprecatedclassesand functions) Figure7showsanexampleC++programthatusesthe twomainclassesOBConversionandOBMoltoprint outthemolecularweightofallofthemoleculesinan SDFfile.Thiscouldbeused,forexample,toinvestigate differencesinthemolecularweightdistributionbetween twodatabases.ThesameprogramisshowninFigure8 butimplementedusingthePythonbindings. ExamplesofUse OpenBabelhasalreadybeenreferencedover400times forvarioususes.ThemostcommonuseofOpenBabel isthroughthe obabel commandlineapplication(orthe correspondinggraphicaluserinterface)fortheintercon- versionofchemicalfileformats.Suchconversionsmay alsoinvolvethecalculationorinferenceofadditional molecularinformationorapplicationofafilter.Some Figure5 Thefourfailuresfoundinthevalidationtestforcanonicalization . O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page8of14 publishedexamplesoftheseincludethefollowing:  interconversionofchemicalfileformatsorrepre- sentations[44-47]  additionofhydrogens[48-50]  generationof3Dmolecularstructures[51-53]  calculationofpartialcharges[54,55]  generationofmolecularfingerprints[56-59]  removalofduplicatemoleculesfromadataset[60]  calculationofMOL2atomtypes[61] Aninterestingexampletha tshowshowaparticular chemicalrepresentationmaybeusedtofacilitatea scientificstudyisthecrystallographicstudyofFábián andBrockwhousedOpenBabeltogenerateInChI stringsformoleculesintheCambridgeStructuralData- base[62].ExploitingthefactthatInChIsofenantiomers areidenticalexpectattheenantiomersublayer("/m0 ” Figure6 ScreenshotoftheOpenBabelGUI .Inthescreenshot,theOpenBabelGUIisrunningonBio-Linux6.0,anUbuntuderivative. Figure7 ExampleC++programthatusestheOpenBabel library .Theprogramprintsoutthemolecularweightofeach moleculeintheSDFfile “ dataset.sdf ” . Figure8 ExamplePythonprogramthatusestheOpenBabel library .Theprogramprintsoutthemolecularweightofeach moleculeintheSDFfile “ dataset.sdf ” . O ’ Boyle etal . JournalofCheminformatics 2011, 3 :33 http://www.jcheminf.com/content/3/1/33 Page9of14 Table1SoftwareapplicationsandlibrariesthatuseOpenBabelReferenceWebpageGUIformolecularmodellingandcomputationalchemistryG.HutchisonM.Hanwellhttp://avogadro.openmolecules.net/Parsecomputationalchemistryoutputfiles[72]http://cclib.sf.net/GUIforcomputationalchemistryJensThomashttp://www.cse.scitech.ac.uk/ccg/software/ManageachemicallaboratorydatabaseRémyDernathttp://chemaztech.sf.net/ChemSpotlightChemistryfileindexerforMacOSXG.Hutchisonhttp://chemspotlight.openmolecules.net/GUIforgeneratingcombinatoriallibrariesRuiAbreuhttp://www.esa.ipb.pt/~ruiabreu/chemt2Dmoleculardrawing[73]http://ruby.chemie.uni-freiburg.de/~martin/Libraryforhandlingandpreparingmulti-scalemulti-paradigm[74]http://web.mit.edu/mbuehler/www/research/CMDF/CMDF.htmSystematicallygenerateconformers[36]http://confab.googlecode.com/DockoMaticAutomatethepreparationandanalysisofAutoDockruns[75]http://sf.net/projects/dockomatic/DOVIS2.0AutomatethepreparationandanalysisofAutoDockruns[76]http://www.bhsai.org/dovis.htmlFAF-Drugs2ADMETfilteringofmoleculardatasets[77]http://www.mti.univ-paris-diderot.fr/fr/downloads.htmlLarge-scalechemicalgraphminingbasedonbackbonerefinementclasses[78,79]http://www.maunz.de/wordpress/bbrcGUIforcomputationalchemistryhttp://www.uku.fi/~thassine/projects/ChemistryUtils2Dchemicaleditor,3Dviewer,chemicalcalculatorandperiodictableforLinuxJeanBréforthttp://gchemutils.nongnu.org/MacOSXinterfacetoOpenBabelandotherOpenchemistrytoolsChrisSwainhttp://homepage.mac.com/swain/Sites/Macinchem/page65/ibabel3.htmlGUIshowinginformationontheperiodictableoftheelementsCarstenhttp://edu.kde.org/kalzium/LazyStructure-ActivityRelationshipsfortoxicityprediction[80]http://www.in-silico.de/software/GUIforcomputationalchemistryUgoVarettohttp://molekel.cscs.ch/molsKetch2DchemicaleditorHarmvanhttp://molsketch.sf.net/ChemistryextensiontotheMySQLdatabaseJ.Pansanelhttp://mychem.sf.net/NanoEngineer-Computer-aideddesignforthenanoscaleNanorex,Inc.http://nanoengineer-1.net/NanoHive-1Simulatorforthestudy,experimentation,anddevelopmentofnanotechentitiesBrianHelfrichhttp://www.nanohive-1.org/OpenSourcemoleculardynamicsengine[81]http://openmd.net/Open3DQSARHigh-throughputchemometricanalysisofmolecularinteractionfields[82,83]http://www.open3dqsar.org/Extractschemicalstructuresfromimages[84]http://osra.sf.net/ChemistryextensiontothePostgreSQLdatabasehttp://pgfoundry.org/projects/pgchemPharmacophorediscoveryandsearchingSilicosNVhttp://www.silicos.be/Pharmacophoresearching[85]http://smoothdock.ccbb.pitt.edu/pharmerShape-basedalignmentofmoleculesSilicosNVhttp://www.silicos.be/Libraryforhandlingandpreparingquantummechanicalmulti-scalesimulations[86]http://www.ipc.kit.edu/cfn-ysg/158.phpGUIforvirtualscreeningwithprotein-liganddockinghttp://pyrx.scripps.edu/GUIforanalysingresultsofquantumchemistrycalculations[72]http://qmforge.sf.net/ReactionMechanismGenerator[87]http://rmg.sf.net/evisualizationof3Dmodelsofscientificdata,suchasmolecularstructuresandsurfacesT.J.ODonnellhttp://sci3d.sf.net/FiltermoleculesfromdatasetsSilicosNVhttp://www.silicos.be/Generationoffragment-basedstructure-activityrelationships[88]http://www.karwath.org/systems/smirep.htmlExtractmolecularscaffoldsSilicosNVhttp://www.silicos.be/etalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page10of14 ),theyusedtheInChIsaspartofaworkflowtoidentifykryptoracemates(aclassofracemiccrystalswheretheenantiomersarenotrelatedbyspace-groupsymmetry)inthedatabase.Toimplementnewmethods,oraccessadditionalmole-cularinformation,itisnecessarytousetheOpenBabellibrarydirectlyeitherfromC++orusingoneofthesup-portedlanguagebindings.Someexamplesofpublishedstudiesthathavedonethisincludethefollowing:etal.implementedmolecularcomplexitymeasuresbasedoninformationtheory[63].LanghamandJaindevelopedamodelforchemicalmutagenicitybasedonatompairfeatures[64].etal.implementedamethod,anchor-GRIND,thatusesananchorpointofamolecularscaffoldtocomparemolecularinteractionfieldswhendifferentsubstituentsarepresent[65].etal.havedevelopedapluginforOpenBabelthataddssupportfortheWebOntologyLan-guage(OWL)toallowautomatedreasoningaboutchemicalstructures[66].etal.(AstraZeneca)implementeda3-pointpharmacophorefingerprintcalledTRUST[67]. Table1SoftwareapplicationsandlibrariesthatuseOpenBabelToxichazardestimationusingdecisiontreesIdeaconsulthttp://toxtree.sf.net/VisualizeatomicstructuressuchascrystalsandgrainboundariesDamienCalistehttp://inac.cea.fr/L_Sim/V_Sim/index.en.htmlWebapplicationforfileformatconversionT.J.ODonnellhttp://webbabel.sf.net/2DmoleculareditorBryanHergerhttp://xdrawchem.sf.net/ExtensiontoAvogadroforcrystal-structureprediction[89]http://xtalopt.openmolecules.net/GUIformoleculargraphics,modelingandsimulationElmarKriegerhttp://www.yasara.org/GUIformolecularmodellinganddocking[90]http://www.zeden.org/ Table2WebapplicationsanddatabasesthatuseOpenBabelNameDescriptionReferenceWebpageDatabaseofsmallmolecules[91]http://cdb.ics.uci.edu/ChemicalstructureandpropertysearchengineCéondoLtdhttp://www.chemeo.com/Webapplicationforanalysingandclusteringsmallmolecules[92]http://chemmine.ucr.edu/Chemicalvendorsearchenginehttp://emolecules.com/FragmentStoreDatabaseforcomparisonoffragmentsfoundinmetabolites,drugsandtoxiccompounds[93]http://bioinf-applied.charite.de/fragment_store/FReeOnlinedruG3Dconformationgeneration[94]http://bioserv.rpbs.univ-paris-diderot.fr/cgi-hBarLabWebapplicationprovidingon-demandaccesstocomputer-aidedhBarSolutionshttps://www.hbar-lab.com/Databaseofhumandrugtargetsandtheirligands[95]http://www.iuphar-db.org/OpenCDLigWebapplicationforsharingresourcesaboutcyclodextrin/ligand[96]https://kdd.di.unito.it/casmedchem/Protein-Small-MoleculeDatabase[97]http://compbio.cs.toronto.edu/psmdb/Webapplicationforcalculationofburiedvolumeoforganometallic[98]https://www.molnac.unisa.it/OMtools/Databaseofmolecularscaffolds[99]http://202.127.30.184:8080/scafbank.htmlWebapplicationforpredictionofsitesofcytochromeP450mediatedmetabolism[100]http://www.farma.ku.dk/smartcyp/sMolExplorerWebapplicationforexploringsmall-moleculedatasets[101]http://www3a.biotec.or.th/isl/index.php/smol-SuperImposéWebapplicationforstructuralsimilaritybetweenligands,bindingsitesorproteins[102]http://farnsworth.charite.de/superimpose-Databaseoftoxiccompounds[103]http://bioinformatics.charite.de/supertoxic/Detailedinformationon,andcomparisonsof,protein-ligandbindingsites[104]http://bioinf-tomcat.charite.de/supersite/Databaseofnaturalandartificialsweeteners[105]http://bioinf-applied.charite.de/sweet/Chemical-proteininteractions[106]http://stitch.embl.de/VirtualComputationalChemistryLaboratory[107]http://www.vcclab.org/wwLigCSRreWebapplicationthatperformsligand-basedscreeningusing3D[108]http://bioserv.rpbs.univ-paris-diderot.fr/Help/wwLigCSRre.htmletalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page11of14 Manyotherexamplesexist[68-71].ThevitalrolethatacheminformaticstoolkitplaysinthedevelopmentofscientificresourcesisshownbyTables1and2.Table1listsexamplesofstand-aloneapplicationsorprogramminglibrariesthatrelyonOpenBabel,eithercallingthelibrarydirectlyorviaoneofthecommand-lineexecutables.Table2containsexamplesofwebapplicationsanddatabasesthateitheruseOpenBabelontheserverorwhereOpenBabelwasusedinthepreparationofthedata.ConclusionsInNovember2011,OpenBabelwillmark10yearsofexistenceasanindependentproject,andforthefirsttime,wehavediscusseditsdevelopmentandfeatures.Asshownbymorethan400citations,ithasbecomeanessentialtoolforhandlingthemyriadofmolecularfileformatsencounteredindiversebranchesofchemistry.Whilemoreworkremainstobedone,throughvalida-tionprocessessuchasthosedescribedaboveandtherecentintroductionofanightlybuildandtestingframe-work,weaimtoimprovethequalityandrobustnessofthetoolkitwitheachnewrelease.Lookingforwardtothefuture,oneofthegoalsoftheprojectistoextendsupporttomoleculesthatcurrentlyarenothandledverywellbyexistingcheminformaticstoolkits.Typicallytoolkitsfocusonthetypesofmole-culesofprincipalimportancetothepharmaceuticalindustry,namelystableorganicmoleculescomprisingwhollyof2-center2-electroncovalentbonds.Moleculesoutsidethisset-suchasradicals,organometallicandinorganicmolecules,moleculeswithcoordinatebondsor3-center2-electronbonds-arepoorlysupportedingeneral.FuturereleasesofOpenBabelwillprovidesub-stantiallyimprovedhandlingofsuchspecies.Wealsoseektoimprovespeedandcoverageofimportantmeth-odssuchasstructuregeneration,kekulizationandOpenBabelisfreelyavailablefromhttp://openbabel.org,andnewcommunitymembersareverywelcome(users,developers,bugreporters,featurerequesters).ForinformationonhowtouseOpenBabel,pleaseseethedocumentationathttp://openbabel.org/docsandtheAPIdocumentationathttp://openbabel.org/api.AvailabilityandRequirementsProjectName:OpenBabelProjecthomepage:Operatingsystem(s):Programminglanguage:C++,bindingstoPython,Perl,Ruby,Java,C#Otherrequirements(ifcompiling):CMake2.4+GNUGPLv2Anyrestrictionstousebynon-academics:AcknowledgementsandFundingWewouldliketothankallusersandcontributorstotheOpenBabelprojectoveritshistory,includingOpenEyeScientificSoftwareInc.fortheirinitialOELibcode.WealsothanktheBlueObeliskMovementforideas,commentsonthismanuscript,andsupport.WethankSourceForgeforprovidingresourcesforissuetrackingandmanagingreleases,andKitwareforadditionaldashboardresources.NMOBissupportedbyaHealthResearchBoardCareerDevelopmentFellowship(PD/2009/13).AuthordetailsAnalyticalandBiologicalChemistryResearchFacility,CavanaghPharmacyBuilding,UniversityCollegeCork,Co.Cork,Ireland.DepartmentofChemistry,TechnischeUniversitätMünchen,GarchingD-85747,Germany.eMolecules,Inc.,420StevensAve#120,SolanaBeach,CA92075,USA.OpenBabeldevelopmentteam.UniversityofPittsburgh,DepartmentofChemistry,219ParkmanAvenue,Pittsburgh,PA15217,USA.GRHistheleaddeveloperoftheOpenBabelproject.CAJ,CM,MB,NMOB,andTVaredevelopersofOpenBabel.Allauthorsreadandapprovedthefinalmanuscript.CompetinginterestsTheauthorsdeclarethattheyhavenocompetinginterests.Received:27June2011Accepted:7October2011Published:7October20111.WeiningerD:SMILES,achemicallanguageandinformationsystem.1.Introductiontomethodologyandencodingrules.JChemInfComputSci2.Murray-RustP,RzepaH:Chemicalmarkup,XML,andtheWorldwideWeb.1.Basicprinciples.JChemInfComputSci3.Murray-RustP,RzepaHS:ChemicalMarkup,XMLandtheWorld-WideWeb.2.InformationObjectsandtheCMLDOM.JChemInfModel4.Murray-RustP,RzepaH,WrightM:Developmentofchemicalmarkuplanguage(CML)asasystemforhandlingcomplexchemicalcontent.NewJChem5.Murray-RustP,RzepaH:ChemicalMarkup,XML,andtheWorldWideWeb.4.CMLSchema.JChemInfComputSci6.HollidayGL,Murray-RustP,RzepaHS:ChemicalMarkup,XML,andtheWorldWideWeb.6.CMLReact,anXMLVocabularyforChemicalJChemInfModel7.DaylightTheory::,SMARTShttp://www.daylight.com/dayhtml/doc/theory/8.FogelK:ProducingOpenSourceSoftware:HowtoRunaSuccessfulFreeSoftwareProjectReillyMedia,Inc.Sebastopol,CA;2005.9.CitationsweregeneratedbyGoogleScholar:[http://scholar.google.com/10.Aselectionofsuchprojectsisincludedbelow.:,Thefulllistisavailableat:11.OpenBabel::[http://openbabel.org/].12.OpenBabelReportFormat::[http://openbabel.org/docs/2.3.0/FileFormats/13.OpenBabelFingerprintFormat::[http://openbabel.org/docs/2.3.0/14.OpenBabelFastsearchFormat::[http://openbabel.org/docs/2.3.0/15.MolPrint2DFormat::[http://openbabel.org/docs/2.3.0/FileFormats/16.BenderA,MussaHY,GlenRC,ReilingS:MolecularSimilaritySearchingUsingAtomEnvironments,Information-BasedFeatureSelection,andaNaïveBayesianClassifier.JChemInfModel17.MNAFormat::[http://openbabel.org/docs/2.3.0/FileFormats/etalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page12of14 18.FilimonovD,PoroikovV,BorodinaY,GloriozovaT:ChemicalSimilarityAssessmentthroughMultilevelNeighborhoodsofAtoms:DefinitionandComparisonwiththeOtherDescriptors.JChemInfModel19.PDBFormatv3.2::[http://www.wwpdb.org/documentation/format32/v3.2.20.PDB:CrufttoContent::[http://www.daylight.com/meetings/mug01/Sayle/21.MorganHL:TheGenerationofaUniqueMachineDescriptionforChemicalStructures-ATechniqueDevelopedatChemicalAbstractsJChemDocum22.Nauty::[http://cs.anu.edu.au/~bdm/nauty/].23.McKayBD:Practicalgraphisomorphism.CongressusNumerantium24.GakhA,BurnettM:ModularChemicalDescriptorLanguage(MCDL):Composition,connectivity,andsupplementarymodules.JChemInfComputSci25.TrepalinSV,YarkovAV,PletnevIV,GakhAA:AJavaChemicalStructureEditorSupportingtheModularChemicalDescriptorLanguage(MCDL).26.GakhAA,BurnettMN,TrepalinSV,YarkovAV:ModularChemicalDescriptorLanguage(MCDL):Stereochemicalmodules.JCheminf27.HalgrenT:Merckmolecularforcefield.1.Basis,form,scope,parameterization,andperformanceofMMFF94.JComputChem28.HalgrenT:Merckmolecularforcefield.2.MMFF94vanderWaalsandelectrostaticparametersforintermolecularinteractions.JComputChem29.HalgrenT:Merckmolecularforcefield.3.MoleculargeometriesandvibrationalfrequenciesforMMFF94.JComputChem30.HalgrenT,NachbarR:Merckmolecularforcefield.4.ConformationalenergiesandgeometriesforMMFF94.JComputChem31.HalgrenT:Merckmolecularforcefield.5.ExtensionofMMFF94usingexperimentaldata,additionalcomputationaldata,andempiricalrules.ComputChem32.AndronicoA,RandallA,BenzRW,BaldiP:Data-drivenhigh-throughputpredictionofthe3-Dstructureofsmallmolecules:reviewandprogress.JChemInfModel33.RappeA,CasewitC,ColwellK,GoddardWIII,SkiffWM:UFF,afullperiodictableforcefieldformolecularmechanicsandmoleculardynamicsJAmChemSoc34.WangJ,WolfRM,CaldwellJW,KollmanPA,CaseDA:Developmentandtestingofageneralamberforcefield.JComputChemWangJ,WangW,KollmanPA,CaseDA:Automaticatomtypeandbondtypeperceptioninmolecularmechanicalcalculations.JMolecGraph36.OBoyleNM,VandermeerschT,FlynnCJ,MaguireAR,HutchisonGR:-Systematicgenerationofdiverselow-energyconformers.JCheminf37.CMake::[http://www.cmake.org/].38.MartinK,HoffmanB:MasteringCMake:ACross-PlatformBuildSystem.Kitware,Inc.,CliftonPark,NY;,52010.39.CDashDashboardforOpenBabel::[http://my.cdash.org/index.php?40.OBoyleN,MorleyC,HutchisonGR:Pybel:aPythonwrapperfortheOpenBabelcheminformaticstoolkit.ChemCentJ41.OpenBabelBugTracker::[https://sourceforge.net/tracker/?42.Doxygen::[http://www.doxygen.org/].43.OpenBabelAPI::[http://openbabel.org/api].44.MyersJ,AllisonT,BittnerS,DidierB,FrenklachM,GreenW,HoY,HewsonJ,KoeglerW,LansingC,etalAcollaborativeinformaticsinfrastructureformulti-scalescience.ClusterComputing45.LindP,AlmM:ADatabase-CentricVirtualChemistrySystem.JChemInf46.AminiA,ShrimptonPJ,MuggletonSH,SternbergMJE:Ageneralapproachfordevelopingsystem-specificfunctionstoscoreprotein-liganddockedcomplexesusingsupportvectorinductivelogicprogramming.Struct,Funct,Bioinf47.ArborS,MarshallGR:Avirtuallibraryofconstrainedcyclictetrapeptidesthatmimicsallfourside-chainorientationsforoverhalfthereverseturnsintheproteindatabank.JComput-AidedMolDes48.HuangZ,WongCF:AMiningMinimaApproachtoExploringtheDockingPathwaysofp-NitrocatecholSulfatetoYopH.BiophysJ49.HillAD,ReillyPJ:AGibbsfreeenergycorrelationforautomateddockingofcarbohydrates.JComputChem50.ArmenRS,ChenJ,BrooksCLIII:AnEvaluationofExplicitReceptorFlexibilityinMolecularDockingUsingMolecularDynamicsandTorsionAngleMolecularDynamics.JChemTheoryComp51.LiuL,MaH,YangN,TangY,GuoJ,TaoW,JaaDuan:ASeriesofNaturalFlavonoidsasThrombinInhibitors:Structure-activityrelationships.ThrombRes52.WallachI,JaitlyN,LilienR:AStructure-BasedApproachforMappingAdverseDrugReactionstothePerturbationofUnderlyingBiologicalPLoSOnePailaYD,TiwariS,SenguptaD,ChattopadhyayA:Molecularmodelingofthehumanserotonin1Areceptor:roleofmembranecholesterolinligandbindingofthereceptor.MolecularBioSystems54.MelvilleJL,HirstJD:TMACC:InterpretableCorrelationDescriptorsforQuantitativeStructureActivityRelationships.JChemInfModel55.PenchevaT,LagorceD,PajevaI,VilloutreixBO,MitevaMA:AutomatedMolecularMechanicsOptimizationtoolforinsilicoBMCBioinformatics56.SchietgatL,RamonJ,BruynoogheM:AnEfficientlyComputableGraph-BasedMetricfortheClassificationofSmallMolecules.Proceedingsofthe11thInternationalConferenceonDiscoveryScienceSpringer-VerlagBerlin,Heidelberg;2008,197-209.57.KrierM,HutterMC:BioisostericSimilarityofMoleculesBasedonStructuralAlignmentandObservedChemicalReplacementsinDrugs.ChemInfModel58.WangX,HuanJ,SmalterA,LushingtonGH:Applicationofkernelfunctionsforaccuratesimilaritysearchinlargechemicaldatabases.59.ChengT,LiQ,WangY,BryantSH:BinaryClassificationofAqueousSolubilityUsingSupportVectorMachineswithReductionandRecombinationFeatureSelection.JChemInfModel60.MihalevaVV,VerhoevenHA,deVosRCH,HallRD,vanHamRCHJ:AutomatedprocedureforcandidatecompoundselectioninGC-MSmetabolomicsbasedonpredictionofKovatsretentionindex.61.BasDC,RogersDM,JensenJH:VeryfastpredictionandrationalizationofpKavaluesforprotein-ligandcomplexes.Proteins:Struct,Funct,Bioinf62.FabianL,BrockCP:Alistoforganickryptoracemates.ActaCryst63.DehmerM,BarbariniN,VarmuzaK,GraberA:ALargeScaleAnalysisofInformation-TheoreticNetworkComplexityMeasuresUsingChemicalPLoSOne64.LanghamJJ,JainAN:AccurateandInterpretableComputationalModelingofChemicalMutagenicity.JChemInfModel65.FontaineF,PastorM,ZamoraI:Anchor-GRIND:Fillingthegapbetweenstandard3DQSARandtheGRid-INdependentDescriptors.JMedChem66.KonykM,DeLeonA,DumontierM:ChemicalknowledgeforthesemanticDataIntegrationintheLifeSciencesSpringer-VerlagBerlin,Heidelberg;2008,169-176.KogejT,EngkvistO,BlombergN,MuresanS:MultifingerprintBasedSimilaritySearchesforTargetedClassCompoundSelection.JChemInf68.ReynèsC,HostH,CamprouxA-C,LacondeG,LerouxF,MazarsA,DeprezB,FahraeusR,VilloutreixBO,SperandioO:DesigningFocusedChemicalLibrariesEnrichedinProtein-ProteinInteractionInhibitorsusingMachine-LearningMethods.PLoSComputationalBiology69.LagorceD,PenchevaT,VilloutreixBO,MitevaMA:DG-AMMOS:ANewtooltogenerate3DconformationofsmallmoleculesusingDistanceGeometryandAutomatedMolecularMechanicsOptimizationforinsilicoScreening.BMCChemicalBiologyetalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page13of14 70.GómezMJ,PazosF,GuijarroFJ,deLorenzoV,ValenciaA:environmentalfateoforganicpollutantsthroughtheglobalmicrobialMolecularSystemsBiology71.KaziusJ,NijssenS,KokJ,BäckT,IJzermanAP:SubstructureMiningUsingElaborateChemicalRepresentation.JChemInfModel72.OBoyleNM,TenderholtAL,LangnerKM:cclib:Alibraryforpackage-independentcomputationalchemistryalgorithms.JComputChem73.BrüstleM:Chemtool-MolekülezeichnenmitdemPinguin.ausderChemie74.BuehlerM,DodsonJ,vanDuinA:TheComputationalMaterialsDesignFacility(CMDF):Apowerfulframeworkformulti-paradigmmulti-scaleMaterialsResearchSocietysymposiumproceedings75.BullockCW,JacobRB,McDougalOM,HampikianG,AndersenT:Dockomatic-automatedligandcreationanddocking.BMCResearch76.JiangX,KumarK,HuX,WallqvistA,ReifmanJ:DOVIS2.0:anefficientandeasytouseparallelvirtualscreeningtoolbasedonAutoDock4.0.CentJ77.LagorceD,SperandioO,GalonsH,MitevaMA,VilloutreixBO:FreeADME/toxfilteringtooltoassistdrugdiscoveryandchemicalbiologyprojects.BMCBioinformatics78.MaunzA,HelmaC,KramerS:Efficientminingforstructurallydiversesubgraphpatternsinlargemoleculardatabases.MachineLearning79.MaunzA,HelmaC,KramerS:Large-scalegraphminingusingbackbonerefinementclasses.Proceedingsofthe15thACMSIGKDDInternationalConferenceonKnowledgeDiscoveryandDataMining(KDD2009)ACMParis;2009,617-626.80.HelmaC:Lazystructure-activityrelationships(lazar)forthepredictionofrodentcarcinogenicityandSalmonellamutagenicity.MolDiversity81.MeinekeMA,VardemanCF,LinT,FennellCJ,GezelterJD:OOPSE:anobject-orientedparallelsimulationengineformoleculardynamics.ComputChem82.ToscoP,BalleT:Brute-forcepharmacophoreassessmentandscoringwithOpen3DQSAR.JCheminf3(Suppl1)83.ToscoP,BalleT:Open3DQSAR:anewopen-sourcesoftwareaimedathigh-throughputchemometricanalysisofmolecularinteractionfields.MolModel84.FilippovIV,NicklausMC:OpticalStructureRecognitionSoftwareToRecoverChemicalInformation:OSRA,AnOpenSourceSolution.JChemInfModel85.KoesDR,CamachoCJ:Pharmer:EfficientandExactPharmacophoreJChemInfModel86.JacobCR,BeyhanSM,BuloRE,GomesASP,GötzAW,KiewischK,SikkemaJ,VisscherL:PyADF-AscriptingframeworkformultiscalequantumJComputChem87.GreenHWilliam,AllenWJoshua,AshcraftWRobert,BeranJGregory,ClassACaleb,GaoConnie,FranklinGoldsmithC,HarperRMichael,JalanAmrit,MagoonRGregory,MatheuMDavid,MerchantSShamel,MoDJeffrey,PetwaySarah,RamanSumathy,SharmaSandeep,SongJing,VanGeemMKevin,WenJohn,WestHRichard,WongAndrew,WongHsi-Wu,YelvingtonEPaul,YuJoanna:RMG-ReactionMechanismGenerator2011[http://rmg.sourceforge.net/].88.KarwathA,DeRaedtL:SMIREP:PredictingChemicalActivityfromSMILES.JChemInfModel89.LonieDC,ZurekE:XTALOPT:Anopen-sourceevolutionaryalgorithmforcrystalstructureprediction.ComputPhysCommun90.ZontaN,GrimsteadIJ,AvisNJ,BrancaleA:Accessiblehaptictechnologyfordrugdesignapplications.JMolModel91.ChenJH,LinsteadE,SwamidassSJ,WangD,BaldiP:ChemDBupdatefull-textsearchandvirtualchemicalspace.92.BackmanTWH,CaoY,GirkeT:ChemMinetools:anonlineserviceforanalyzingandclusteringsmallmolecules.NucleicAcidsResServerissue)93.AhmedJ,WorthCL,ThabenP,MatzigC,BlasseC,DunkelM,PreissnerR:acomprehensivedatabaseoffragmentslinkingmetabolites,toxicmoleculesanddrugs.NucleicAcidsRes94.MitevaMA,GuyonF,TufferyP:Frog2:Efficient3Dconformationensemblegeneratorforsmallcompounds.NucleicAcidsRes95.SharmanJL,MpamhangaCP,SpeddingM,GermainP,StaelsB,DacquetC,LaudetV,HarmarAJ,NC-IUPHAR:IUPHAR-DB:newreceptorsandtoolsforeasysearchingandvisualizationofpharmacologicaldata.NucleicAcids96.EspositoR,ErmondiG,CaronG:OpenCDLig:afreewebapplicationforsharingresourcesaboutcyclodextrin/ligandcomplexes.JComput-AidedMolDes97.WallachI,LilienR:Theprotein-small-moleculedatabase,anon-redundantstructuralresourcefortheanalysisofprotein-ligandbinding.98.PoaterA,CosenzaB,CorreaA,GiudiceS,RagoneF,ScaranoV,CavalloL:SambVca:AWebApplicationfortheCalculationoftheBuriedVolumeofN-HeterocyclicCarbeneLigands.EurJInorgChem99.YanB-b,XueM-z,XiongB,LiuK,HuD-y,ShenJ-k:ScafBank:apubliccomprehensiveScaffolddatabasetosupportmolecularhopping.PharmacologicaSinica100.RydbergP,GloriamDE,OlsenL:TheSMARTCypcytochromeP450metabolismpredictionserver.101.IngsriswangS,PacharawongsakdaE:sMOLExplorer:anopensource,web-enableddatabaseandexplorationtoolforSmallMOLeculesdatasets.102.BauerRA,BournePE,FormellaA,FrommelC,GilleC,GoedeA,GuerlerA,HoppeA,KnappEW,PoschelT,etalSuperimpose:a3Dstructuralsuperpositionserver.NucleicAcidsRes103.SchmidtU,StruckS,GrueningB,HossbachJ,JaegerIS,ParolR,LindequistU,TeuscherE,PreissnerR:SuperToxic:acomprehensivedatabaseoftoxiccompounds.NucleicAcidsRes104.BauerRA,GuntherS,JansenD,HeegerC,ThabenPF,PreissnerR:dictionaryofmetaboliteanddrugbindingsitesinproteins.NucleicAcids105.AhmedJ,PreissnerS,DunkelM,WorthCL,EckertA,PreissnerR:aresourceonnaturalandartificialsweeteningagents.NucleicAcidsRes106.KuhnM,SzklarczykD,FranceschiniA,CampillosM,vonMeringC,JensenLJ,BeyerA,BorkP:STITCH2:aninteractionnetworkdatabaseforsmallmoleculesandproteins.NucleicAcidsRes107.TetkoIV,GasteigerJ,TodeschiniR,MauriA,LivingstoneD,ErtlP,PalyulinVA,RadchenkoEV,ZefirovNS,MakarenkoAS,etalComputationalChemistryLaboratory-DesignandDescription.Comput-AidedMolDes108.SperandioO,PetitjeanM,TufferyP:wwLigCSRre:a3Dligand-basedserverforhitidentificationandoptimization.NucleicAcidsResdoi:10.1186/1758-2946-3-33Citethisarticleas:etalOpenBabel:AnopenchemicalJournalofCheminformatics W. Jeffery Hurst,The Hershey Company. http://www.chemistrycentral.com/manuscript/ etalJournalofCheminformaticshttp://www.jcheminf.com/content/3/1/33Page14of14

Related Contents


Next Show more