1Isabelle26isagenerictheoremproverbasedonalogicalframework23IsabelleHOListhespecialisationofIsabelleforhigherorderlogic2 ThreeYearsofExperiencewithSledgehammerLCPaulsonandJCBlanchetteused ID: 170902
Download Pdf The PPT/PDF document "ThreeYearsofExperiencewithSledgehammerL...." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
ThreeYearsofExperiencewithSledgehammerL.C.PaulsonandJ.C.BlanchetteThetwoaspectsofproblempreparation(translationintorst-orderlogicandidenticationofrelevantfacts)eachrequiredasubstantialresearcheffort.Thenumerouschoicesoutlinedbelowweremadeonthebasisofinnumerableexperimentsthatconsumedmanythousandsofhoursofprocessortime.2.1TranslationintoFirst-OrderLogicMostinteractivetheoremproverssupportalanguagemuchricherthanthatofrst-orderlogic.Isabelle/HOL[21]supportspolymorphichigher-orderlogic[2,9,10],augmentedwithaxiomatictypeclasses[39].1Manyuserproblemscontainnohigher-orderfeaturesandmightbeimaginedtoliewithinrst-orderlogic;however,eventheseproblemsarefulloftypinginformation.Typeinformationcantakequadraticspace[17]becauseeverytermmustbeannotatedwithitstype,recursively,rightdowntothevariables.Hurd[13]observedthatomittingtypeinformationgreatlyimprovedthesuccessrateofhistheoremprover,Metis.Thisishardlysurprising,sincethetypeinformationvirtuallyburiesthetermsthemselves.HurdwasabletoomittypeinformationbecausehisproofsarereconstructedwithinHOL4[10],whichrejectedanyproofsthatdidnotcorrespondtowell-typedhigher-orderlogicdeductions.Sledgehammerwasalwaysintendedtorelyonananalogousprocessofsoundproofreconstruction,andfromtheoutsetitwasclearthatincludingcompletetypeinformationwouldbeunworkable.Com-pletelyomittingtypeinformation,althoughsuccessfulforHOL4,wouldnothaveworkedforIsabellebecauseofitsheavyuseoftypeclasses.Wechosetoincludeenoughtypeinformationtoenforcecorrecttypeclassreasoning(thetypeclasshierarchyiseasilyexpressedusingHornclauses)butnottospecifythetypeofeveryterm[19,§4].Somecolleagueshaveexpressedhorrorattheveryideaofusingunsoundtranslations;therstauthorhaswrittenalengthyexplorationofthesalientissues[17,§2.8].Higher-orderproblemsposedspecialdifculties.Weneverexpectedrst-ordertheoremproverstoperformdeephigher-orderreasoning,butmerelyhopedtoautomateproofswherethehigher-orderstepsweretrivial.Weexaminedseveralmethodsoftranslatinghigher-orderproblemsintorst-orderlogic,allowingtruthvaluestobethevaluesoftermsandcurriedfunctionstotakevaryingnumbersofarguments[17].Weeventuallyadoptedatranslationbasedontheonethatweusedforrst-orderlogic,modiedtointroducehigher-ordermechanisms(suchasanapplyoperatorforfunctionvalues)onlywhenabsolutelynecessary.Wetherebyeliminatedouroriginaldistinctionbetweenrst-orderandhigher-orderproblems.Ahigher-orderfeaturewithinaproblemaffectsthetranslationlocally,yieldingasmoothtransitionfrompurelyrst-ordertoheavilyhigher-orderproblems.Wealsoexperimentedwithtwomethodsofeliminatingl-abstractionsinterms:bytranslatingthemintocombinatorformorbydeclaringequivalentfunctions.WeultimatelyoptedforanaivetranslationschemebasedonthecombinatorsS,K,I,B,andC:moresophisticatedschemesdeliverednoadditionalbenets.Unfortunately,ourexperiencesuggeststhatSledgehammerisseldomsuccessfulonproblemscontaininghigher-orderelements.Integrationwithagenuinehigher-orderautomatictheoremprover,suchasLEO-II[5]andSatallax[3],seemsnecessary.Thiswouldposeinterestingproblemsforproofreconstruction:LEO-II'sapproachistoreducehigher-orderproblemstorst-orderonesbyrepeatedlyapplyingspecialisedinferencerulesandthencallingrst-orderATPs.ALEO-IIproofwillthereforeconsistofastringofhigher-orderstepsfollowedbyarst-orderproof.Thelatterpartweknowhowtodo;thecrucialchallengeistodeviseareliablewayofemulatingthehigher-orderstepswithinIsabelle.Arithmeticremainsanissue.Apurelyarithmeticproblemcanbesolvedusingdecisionprocedures,butwhataboutproblemsthatcombinearithmeticwithasignicantamountoflogic?Inprinciple,Sledge-hammercouldsolvesuchproblemswiththehelpofanATPthatcombinedarithmeticandlogicalreason-ing,analogoustoLEO-II'sapproachtohigher-orderlogic.CurrentSMTsolversareprobablyoflittle 1Isabelle[26]isagenerictheoremprover,basedonalogicalframework[23].Isabelle/HOListhespecialisationofIsabelleforhigher-orderlogic.2 ThreeYearsofExperiencewithSledgehammerL.C.PaulsonandJ.C.BlanchetteusedSledgehammermanytimeswhileconstructingaproof,woulditbefeasibletorunthatproofagain,perhapstomodifyitusingalaptopwhileataconference?Tobeuseful,Sledgehammerwouldhavetoreturnapieceofproofscriptthatcouldbeexecutedcheaply.2.3.1ReconstructionoftheResolutionProofTheoriginalplanwastoemulatetheinferencerulesofautomatictheoremproversdirectlywithinIsa-belle.Weshouldhaveknownbetter:Hurd[12]hadnoticedthattheproofsdeliveredbyGandalf[35]werenotdetailedandexplicitenough.WemadethesamediscoverywithSPASSand,despiteconsider-ableefforts,wereonlyabletoreconstructahandfulofproofs[19].Wecameupwithanewplan:touseageneraltheoremprover,Metis,toreconstructeachproofstep.MetiswasdesignedtobeinterfacedwithLCF-styleinteractivetheoremprovers,specicallyHOL4.IntegratingitwithIsabelle'sproofkernelrequiredsignicanteffort[28].MetisthenbecameavailabletoIsabelleusers,anditturnedouttobecapableofreconstructingproofstepseasily.TheoutputofSledgehammerwasnowalistofcallstoMetis,eachofwhichprovedaclause.Whiletheoutputisprimarilydesignedforreplayingproofs,italsohasapedagogicalvalue:unlikeIsabelle'sautomatictactics,whichareblackboxes,theproofsdeliveredbySledgehammercanbeinspectedandunderstood.Considerthetheoremlength(tlxs)lengthxs,whichstatesthatthetailofalist(thelistfromwhichweremoveitsrstelement,ortheemptylistifthelistisempty)isshorterthanorofequallengthastheoriginallist.TheproofproducedbyVampire,expressedinIsabelle'sstructuredIsarformat,looksasfollows:proofneg_clausifyassume:length(tlxs)lengthxshencedrop(lengthxs)(tlxs)6=[]by(metisdrop_eq_Nil)hencetl(drop(lengthxs)xs)6=[]by(metisdrop_tl)hence8u:xs@u6=xs_tlu6=[]by(metisappend_eq_conv_conj)hencetl[]6=[]by(metisappend_Nil2)thusFalseby(metistl.simps(1))qedTheneg_clausifymethodtransformstheIsabelleconjectureintonegatedclauseform,ensuringthatithasthesameshapeasthecorrespondingATPconjecture.Thenegationoftheclauseisintroducedbytheassumekeyword,andaseriesofintermediatefactsintroducedbyhenceleadtoacontradiction.ThisapproachwasinspiredbytheOttererprooftransformationservice[40].Resolutionproofsshouldideallybetranslatedtonatural,intuitiveIsabelleproofs.Thebest-knownpriorworkontranslatingresolutionproofsisTRAMP[16];itsapplicabilitytoSledgehammerisunexplored.PreliminaryworkhascommencedatMunichtoseetowhatextentresolutionproofscanbetrans-formedintointelligibleproofs.Therststepistotransformtheproofintoadirectproofbyapplyingcontrapositionrepeatedlyandintroducingcasesplitswhereappropriate.Forexample,theproofaboveistransformedintoproofhavetl[]=[]by(metistl.simps(1))hence9u:xs@u=xs^tlu=[]by(metisappend_Nil2)hencetl(drop(lengthxs)xs)=[]by(metisappend_eq_conv_conj)hencedrop(lengthxs)(tlxs)=[]by(metisdrop_tl)thuslength(tlxs)lengthxsby(metisdrop_eq_Nil)qedFormostIsabelleusers,thedirectproofismucheasiertounderstandandmaintain.4 ThreeYearsofExperiencewithSledgehammerL.C.PaulsonandJ.C.Blanchettebebettertoemployevenmoretheoremprovers.Wehaveundertakeninformal,unpublishedexperimentsinvolvingmanyothersystems.Gandalf[35]showsgreatpotential,butunfortunatelyitdoesnotoutputusefulproofs;onecannoteasilyidentifywhichaxiomshavetakenpartintheproof.AsimplesourcecodemodicationtoimprovethelegibilityofproofswouldallowGandalftomakeusefulcontributions.Unfortunately,wewereunabletoidentifythenecessarychanges.Gandalfhasbeenfoundtobeunsound,2butasmallpercentageofincorrect(andhenceunreconstructable)proofswouldbetolerable.SInE,theSumoInferenceEngine[11],isawrapperaroundEthatisdesignedtocopewithlargeaxiombases.WepassitmorefactsthancanbehandledbytheotherATPs,anditsometimessurprisesuswithoriginalproofs.Inthecurrentexperimentalsetup,itisinvokedremotelyviaSystemOnTPTP[34]inparallelwithVampire.PeoplesometimessuggestthatweincludeProver9[15].Inourexperiments,Prover9performedpoorlyonthelargeproblemsgeneratedbySledgehammer.Itcouldbeeffectiveinconjunctionwithanadvancedandselectiverelevancelter.Wecouldalsorunmultipleinstancesofatheoremproverwithdifferentheuristics.ThisisnotnecessarywithVampire,whichattemptsavarietyofheuristicsinseparatetimeslices.ItcouldbeparticularlyeffectivewithE,butdesigningsuitableheuristicsrequireshighlyspecialisedskills.3EvaluationIntheirJudgementDaystudy,BöhmeandNipkow[8]evaluatedSledgehammerwithE,SPASS,andVampireon1240provableproofgoalsarisinginsevenrepresentativeIsabelletheories:ArrowArrow'simpossibilitytheoremNSNeedhamSchroedershared-keyprotocolHoareCompletenessofHoarelogicwithproceduresJinjaTypesoundnessofasubsetofJavaSNStrongnormalisationofthetypedl-calculuswithdeBruijnindicesFTAFundamentaltheoremofalgebraFFTFastFouriertransformSledgehammerhasbeendevelopedfurthersincetheyrantheirexperiments.Inparticular,itnowcom-municateswithATPsusingfullrst-orderlogicinsteadofclauseform,addsSInEtothecollectionofATPs,andemploysthelatestversionsofSPASS,Vampire,andMetis.Toaccountforthesechanges,werantheJudgementDaybenchmarksuiteonthesamehardwareasBöhmeandNipkowbutwiththelatestversionofSledgehammerandoftheIsabelletheories.WhenrunningthefourATPsinparallelfor120seconds,followedbyMetiswitha30-secondtimelimit,Sledgehammernowsolves52%ofthegoals(comparedwith48%inBöhmeandNipkow).ThetablebelowgivesthesuccessratesforeachATPandtheory. ArrowNSHoareJinjaSNFTAFFT Avg. SInE0.4 18%22%43%31%61%53%17% 40%E1.0 19%39%45%33%66%57%17% 44%SPASS3.7 30%35%43%32%59%58%17% 44%Vampire1.0 36%40%50%35%63%60%17% 47% Together 43%45%54%41%68%65%26% 52% 2Seehttp://www.cs.miami.edu/~tptp/TPTP/BustedAsUnsound.html.6 ThreeYearsofExperiencewithSledgehammerL.C.PaulsonandJ.C.BlanchettehencexspaceMby(metissets_into_spacelambda_system_sets)hencespaceM(spaceMx)=xby(metisdouble_diffequalityE)thusspaceMx2lambda_systemMfusingxby(forcesimpadd:lambda_system_def)qedEachoftheintermediatefactsisprovedbyacalltoMetisthatwasgeneratedusingSledgehammer.Whiletheexamplefeaturesalinearprogressionoffacts,Isarproofscanalsobenestedtoanydepth.Isaralsosupportscalculationalreasoning[4].Achainofreasoningsteps,connectedbyfamiliarrelationssuchas=,,and,canbewrittenwithseparateproofsforeachstepofthecalculation.Onceagain,iftheusercanseetheintermediatestagesofthetransformation,thentheproofofeachstepcaneasilybefound.Theexamplebelowillustratesthistypeofreasoning:proof...havef(u\(x\y))+f(ux\y)=(f(u\(x\y))+f(u\yx))+f(uy)by(metisclass_semiring.add_aey)alsohave:::=(f((u\y)\x)+f(u\yx))+f(uy)by(metisInt_commuteInt_left_commute)alsohave:::=f(u\y)+f(uy)usingfxIntyubyautoalsohave:::=fuby(metisfyu)nallyshowf(u\(x\y))+f(ux\y)=fu.qedTopdownproofdevelopmentisgreatlyassistedbyatrivialIsarfeature:theabilitytoomitproofs.Whereaproofisrequired,theusermaysimplyinsertthewordsorry.Isabellethenregardsthetheoremasproved.4Theusercanthencheckthatthenewlyintroducedpropositionindeedsufcestoprovethenextpropositioninthedevelopment.Adifcultproofcandevelopasaseriesofpropositions,eachinitiallyprovedusingsorrybuteventuallyusingeitherSledgehammer,anautomatictactic,oranestedproofdevelopmentofthesameform.Progressinsuchaproofcanbemeasuredintermsofthedifcultyofthepropositionsthatlackrealproofs.Althoughwecanneverbecertainthataproofdevelopmentcanbecompleteduntiltheveryend,theabilitytowritesorryinplaceofaproofreducestheriskofdiscoveringthatalemmaisuselessonlyafterspendingweeksprovingit.InJanuary2010,aspartofitsnewM.Phil.programme,theUniversityofCambridgeofferedalecturecourseonIsabelle[22].Thecoursematerialsincludedalmostnoinformationaboutthelow-leveltacticsthathadbeenthemainstayofIsabelleproofsfornearly20years.Onlytwoofthe12lecturesweredevotedtoIsarstructuredproofs,andtheytookanovelapproach:ratherthanproceedingmethodicallythroughtheIsarfundamentals,thelecturespresentedtheouterskeletonofaproof,withcrucialsectionsreplacedbysorry.TheydescribedtheideaoftryingtoeliminateeachsorryusingeitherSledgehammerorsomeautomatictactic.Practicalworksubmittedbythestudentslaterdemonstratedthatseveralofthemhadlearnthowtowritecomplex,well-structuredproofs.WewerehappytoreassurethemthatsubmittingworkgeneratedlargelybySledgehammerwasbynomeanscheating! 4TheexistenceofsorrydoesnotcompromiseIsabelle'ssoundness,becauseitisonlypermittedduringinteractivesessions.Atheorylecontaininganoccurrenceofsorrymaynotbeimportedbyanothertheory.8 ThreeYearsofExperiencewithSledgehammerL.C.PaulsonandJ.C.Blanchette[4]GertrudBauerandMarkusWenzel.Calculationalreasoningrevisited(anIsabelle/Isarexperience).InRichardJ.BoultonandPaulB.Jackson,editors,TheoremProvinginHigherOrderLogics:TPHOLs2001,LNCS2152,pages7590.Springer,2001.Onlineathttp://link.springer.de/link/service/series/0558/tocs/t2152.htm.[5]ChristophBenzmüller,LawrenceC.Paulson,FrankTheiss,andArnaudFietzke.LEO-IIAcooperativeautomatictheoremproverforhigher-orderlogic.InAlessandroArmando,PeterBaumgartner,andGillesDowek,editors,AutomatedReasoning4thInternationalJointConference,IJCAR2008,LNAI5195,pages162170.Springer,2008.[6]ChristophBenzmüllerandVolkerSorge.OANTSAnopenapproachatcombininginteractiveandau-tomatedtheoremproving.InManfredKerberandMichaelKohlhase,editors,SymbolicComputationandAutomatedReasoning,pages8197.A.K.Peters,2000.[7]MarcBezem,DimitriHendriks,andHansdeNivelle.Automaticproofconstructionintypetheoryusingresolution.JournalofAutomatedReasoning,29(34):253275,2002.[8]SaschaBöhmeandTobiasNipkow.Sledgehammer:Judgementday.InJürgenGieslandReinerHähnle,editors,AutomatedReasoning(IJCAR2010),LNCS6173,pages107121.Springer,2010.[9]AlonzoChurch.Aformulationofthesimpletheoryoftypes.JournalofSymbolicLogic,5:5658,1940.[10]M.J.C.GordonandT.F.Melham.IntroductiontoHOL:ATheoremProvingEnvironmentforHigherOrderLogic.CambridgeUniversityPress,1993.[11]KrytofHoder.SInE(SumoInferenceEngine).http://www.cs.man.ac.uk/~hoderk/sine/.[12]JoeHurd.IntegratingGandalfandHOL.InYvesBertot,GillesDowek,AndréHirschowitz,ChristinePaulin,andLaurentThéry,editors,TheoremProvinginHigherOrderLogics:TPHOLs'99,LNCS1690,pages311321.Springer,1999.[13]JoeHurd.First-orderprooftacticsinhigher-orderlogictheoremprovers.InMylaArcher,BenDiVito,andCésarMuñoz,editors,DesignandApplicationofStrategies/TacticsinHigherOrderLogics,numberNASA/CP-2003-212448inNASATechnicalReports,pages5668,September2003.[14]DavidMcAllester.Ontic:Aknowledgerepresentationsystemformathematics.InEwingLuskandRossOverbeek,editors,9thInternationalConferenceonAutomatedDeduction,LNCS310,pages742743.Springer,1988.[15]WilliamMcCune.Prover9andMace4.http://www.cs.unm.edu/~mccune/prover9/.[16]AndreasMeier.TRAMP:Transformationofmachine-foundproofsintonaturaldeductionproofsattheasser-tionlevel(systemdescription).InDavidMcAllester,editor,AutomatedDeductionCADE-17InternationalConference,LNAI1831,pages460464.Springer,2000.[17]JiaMengandLawrenceC.Paulson.Translatinghigher-orderclausestorst-orderclauses.JournalofAutomatedReasoning,40(1):3560,2008.[18]JiaMengandLawrenceC.Paulson.Lightweightrelevancelteringformachine-generatedresolutionprob-lems.JournalofAppliedLogic,7(1):4157,2009.[19]JiaMeng,ClaireQuigley,andLawrenceC.Paulson.Automationforinteractiveproof:Firstprototype.InformationandComputation,204(10):15751596,2006.[20]TobiasNipkow.AtutorialintroductiontostructuredIsarproofs.http://isabelle.in.tum.de/dist/Isabelle/doc/isar-overview.pdf.[21]TobiasNipkow,LawrenceC.Paulson,andMarkusWenzel.Isabelle/HOL:AProofAssistantforHigher-OrderLogic.Springer,2002.LNCS2283.[22]LawrenceC.Paulson.Interactiveformalverication.http://www.cl.cam.ac.uk/teaching/0910/L21/.Lecturecoursematerials.[23]LawrenceC.Paulson.Thefoundationofagenerictheoremprover.JournalofAutomatedReasoning,5(3):363397,1989.[24]LawrenceC.Paulson.Settheoryforverication:I.Fromfoundationstofunctions.JournalofAutomatedReasoning,11(3):353389,1993.[25]LawrenceC.Paulson.Settheoryforverication:II.Inductionandrecursion.JournalofAutomatedReason-ing,15(2):167215,1995.10