cseucscedu avg University of California Santa Cruz CA 95064 Abstract In several 64257elds Satis64257ability being one there are regular competitions to compare multiple solvers in a common setting Due to the fact some benchmarks of interest are too d ID: 8353
Download Pdf The PPT/PDF document "Careful Ranking of Multiple Solvers with..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
CarefulRankingofMultipleSolverswithTimeoutsandTiesAllenVanGelderhttp://www.cse.ucsc.edu/avgUniversityofCalifornia,SantaCruz,CA95064Abstract.Inseveralelds,Satisabilitybeingone,thereareregularcompetitionstocomparemultiplesolversinacommonsetting.Duetothefactsomebenchmarksofinterestaretoodicultforallsolverstocompletewithinavailabletime,time-outsoccurandmustbeconsidered.Throughsomestrangeevolution,time-outsbecametheonlyfactorthatwasconsideredinevaluation.PreviousworkinSAT2010observedthatthisevaluationmethodisunreliableandlacksawaytoattachstatisticalsignicancetoitsconclusions.However,theproposedalternativewasquitecomplicatedandisunlikelytoseegeneraluse.Thispaperdescribesasimplersystem,calledcarefulranking,thatper-mitsameasureofstatisticalsignicance,andstillmeetsmanyofthepracticalrequirementsofanevaluationsystem.Itincorporatesoneofthemainideasofthepreviouswork:thatoutcomeshadtobefreedofassumptionsabouttimingdistributions,sothatnon-parametricmethodswerenecessary.Unlikethepreviouswork,itincorporatesties.Thecarefulrankingsystemhasseveralimportantnon-mathematicalprop-ertiesthataredesiredinanevaluationsystem:(1)therelativerankingoftwosolverscannotbein\ruencedbyathirdsolver;(2)afterthecompeti-tionresultsarepublished,aresearchercanrunanewsolveronthesamebenchmarksanddeterminewherethenewsolverwouldhaveranked;(3)smalltimingdierencescanbeignored;(4)thecomputationsshouldbeeasytounderstandandreproduce.Votingsystemsproposedintheliteraturelacksomeoralloftheseproperties.Apropertyofcarefulrankingisthatthepairwiserankingmightcontaincycles.Whetherthisisabugorafeatureisamatterofopinion.Whetheritoccursamongleadersinpracticeisamatterofexperience.ThesystemisimplementedandhasbeenappliedtotheSAT2009Com-petition.Nocyclesoccurredamongtheleaders,buttherewasacycleamongsomelow-rankingsolvers.Tomeasurerobustness,thenewandcurrentsystemswerecomputedwitharangeofsimulatedtime-outs,toseehowoftenthetoprankingschanged.Thatis,timesabovethesim-ulatedtime-outarereclassiedastime-outsandtherankingsarecom-putedwiththisdata.Carefulrankingexhibitedmanyfewerchanges.1IntroductionandOverviewEmpiricalcomparisonofcomputationalperformanceisanimportanttechniqforadvancingthestateoftheartinsoftware.InPropositionalSatisability andseveralrelatedeldstestingprogramsonbenchmarksiscomplicatedbythefactthattimelimitsmustbeset,becausesomebenchmarksofinterestartoodicultforallprogramstocompletewithinavailabletime.Programscanfailtocompleteatestduetoexhaustingtimeorsomeotherresource,oftenmemory.Thereisnoclearlycorrectwaytointegratetheresultsoffailedtestsandcompletedtests,tocomputeasingle\gureofmerit."Tofocusthediscussion,letusassumethatthepropertywewishtomeasureisspeedofsolution,andweareevaluatingtheresultsofaSATcompetition.Ifeveryprogramcouldcompleteeverytest,wewouldsimplyaddupthetimesforeachprogramandrankthemaccordingtothistotal,withsmallestbeingbest.Fromthispointofview,time-outsandotherfailuresaredefectsintheexperiment.Inactuality,thereisnottimeforeveryprogramtocompleteeverytest,andfailuresdooccur.Thisleadstowhatiscalledcensoreddataintheliterature:there\really"isavalueforthetimetheprogramwouldhavetakenonabenchmark,wejustdidnotndoutwhatthatvalueis.Thequestionis,whatisagoodwaytoranktheprograms,basedonthedatathatisavailable.Logically,wewouldwantthisrankingmethodtoproducethesameresultsastheidealexperiment,totheextentpossible.TherankingmethodthathasbeenusedinrecentSATcompetitions,whichweshallcallsolution-countranking,1istosetsometimelimitadhoc,andsimplycounthowmanytestsaresuccessfullycompleted.Throughsomestrangeevolution,time-outs,whicharethemanifestationsofdefectsintheexperiment,becametheonlyfactorthatwasconsideredinevaluation.TotalCPUtimeisusedasatie-breakeronlyifsolutioncountsareequal.PreviousworkbyNikolicobservedthatsolution-countrankingisunreli-ableandlacksawaytoattachstatisticalsignicancetoitsconclusions[Nik10].However,theproposedalternativewasquitecomplicatedandhadsomepracti-caldrawbacks.ThepurposeofthispaperistodescribeandproposeasimplersystemthatmeetsthepracticalrequirementsforrankingsolversinaSATcom-petition(endorsedbyasurveyofsolverdevelopersandusers),1andalsogivesinformationaboutthestatisticalsignicanceoftheresults,orlackthereof.Denition1Practicalrequirements:1.Therelativerankingoftwosolverscannotbein\ruencedbyathirdsolver.2.Afterthecompetitionresultsarepublished,aresearchercanrunanewsolveronthesamebenchmarksanddeterminewherethenewsolverwouldhaveranked.3.Smalltimingdierencescanbeignored.4.Thecomputationsshouldbeeasytounderstandandreproduce.Oneearliermethod,calledthepursemethod,2lackedproperties(1)and(2)andfellintodisfavorafterafewtrials. 1See,whereitiscalled\Lex-icographicalNBSOLVED,sumti."2See. Themethodologywepropose,calledcarefulranking,incorporatesoneofthemainideasofNikolic:thatoutcomesmustbefreeofassumptionsabouttimingdistributions,becausewehavenoinformationaboutthesedistributions.Non-parametricmethodsarenecessary.Unlikethepreviouswork,ourproposalincorporatestiestoaccountfortimingdierencesthatareconsideredinconse-quentialforrankingpurposes.Thecarefulrankingsystemhastheimportantnon-mathematicalproper-tiesgiveninDenition1.Themainingredientofcarefulrankingisthatallpairsofcompetitorsarecomparedinisolation,leadingtoapairof\scores"thatsumtozero.Alargepositivescoreindicatesasignicantlyfastersolver.Thenullhypothesisisthatbothsolversareequallyfast\overall,"or\inthelongrun."Theexpectedvalueofthescoreiszero,underthishypothesis.Thedierencebetweenzeroandtheobservedscoremaybeconvertedintoastandardmeasureofstatisticalsignicance.Forak-waycompetition,therearek(k1)=2pairwisematches.Theresultsareexpressedwithadominancematrix,asdescribedinSection5.Thenalrankingisextractedfromthismatrix.Thereisameta-rankingquestiontobeaddressed.Howcanwecomparevariousrankingmethods,sincewedonotknowthe\trueanswers?"Themethodwepropose,anduse,istomeasuresensitivitytochangesinthetimelimit.Wdonotknowwhatwouldhavehappenedifweusedalargertimelimit.Butwhatwouldhavehappenedunderallshortertimelimitscanbedeterminedfromtheavailabledata.Thecarefulrankingsystemisimplemented3andhasbeenappliedtotheSAT2009Competition.Theimplementationiscshscripts,sed,andawk,whichshouldbeportable.Nocyclesoccurredamongtheleaders,buttherewasacycleamongsomelow-rankingsolvers.Tomeasurerobustness,thenewandcurrentsystemswerecomputedwitharangeofsimulatedtime-outs,toseehowoftenthetoprankingschanged.Thatis,timesabovethesimulatedtime-outarereclassiastime-outsandtherankingsarecomputedwiththisdata.2RelatedWorkThereisalargebodyofworkonvariousaspectsofexperimentalcomparisons.Werestrictourselvestoimmediatelyrelatedworkonrankingsolvers.Non-mathematicalconsiderationsforascoringmethodarediscussedingeneraltermsbyLeBerreandSimon[LBS04],andin\ruencedseveralaspectsofthemethodproposedhere.Onesuchaspectisourprovisionformanytimingdierencestobetreatedasatie,becauseitappearsthatmanypeopleconsidercallingonesolverthewinnerinthesecasesisadistortion.Thereactiontothisperceiveddistortionhasbeentoreducetheimportanceofspeedtonearlynothing,aslongasthesolverstayswithinthetimelimit.Wehopethattreating\minor"timingdierencesastieswillmakethetechniquemoreacceptablethanpriortechniquesthatusedtimeasthemajorconsideration. 3Codeisat. Brglezandco-authors[BLS05,BO07]replicateinstancesintoclassestogatherstatistics.Theirgoalsarequitedierentfromrankingacompetition.Nikolic[Nik10]extendstheseideastocomparemorethantwoprograms.Thenon-mathematical,practicalissuesmentionedinDenition1arenotconsideredinthesepapers.Pulinaconductedanextensiveempiricalevaluationofseveralscoringmeth-ods[Pul06].Onecriterionheusedissimilartotheoneweuse,decreasingthetimelimitandmeasuringstability.Ourproposedmethodissignicantlydierentfromthoseheanalyzed.Mostorallofthecomparisonmethodshestudiedlackedtheindependencefromathirdsolver.Thusalaterresearchercouldnotseewherenewworktintoapreviouscompetition.Pulinaintroducedtheideaofviewingtherankingproblemasavotingsitua-tion:eachbenchmark\votes"forthesolvers(the\candidates")byapreferenceballotthatranksthembysolutiontime.Thisisaveryattractiveidea,butun-fortunately,noneofthewell-knownproposedvotingmethodssatisfythecriteriaofDenition1andelaboratedfurtherintheURLgiventhere.Thereisavastliteratureonthissubject,assurveyedbyLevinandNalebu[LN95],andmorerecentlytreatedbyPomerolandBarba-Romero[PBR00]andTideman[Tid06].Adetailedcomparisonwithallproposalswouldtakeusfaraeld,sowerestrictattentiontotheSchulzemethod,whichhasenjoyedrecentpopularity[Sch03].Thatpopularityisnotsurprising,becausetheSchulzemethod,unlikemanyotherproposals(suchasBorda),permitsvoterstovoteequalpreferencesamongsubsetsofthecandidates(e.g.,D=1,(A;C)=2,B=7isavalidballotinaeldof10).Supposeacompetitionisbeingrunwithsixsolversand63benchmarkswithSchulzeranking(theexamplemayusemanycombinationsofnumbers),andthefollowingeventstranspire.After60benchmarkshavebeenrunonallsolvers,theSchulzerankingiscomputedandsolverAisuniquelywinning.Oneachofthelastthreebenchmarks,solverAhasthebestperformanceofanysolverandsolverDhastheworstperformanceofanysolver.(Forexample,solverDmighttimeoutonthelastthreebenchmarks).However,whentheSchulzerankingiscomputedusingall63benchmarks,Dwins.No,thisisnotatypo.SeeAppendixBof[Sch03]forcompletedetails.4ItisimpossibletoimaginethatanyorganizersofacompetitionwouldadopttheSchulzemethod,iftheyknowaboutthispossibility.Moreover,thisisnotaquirkintheSchulzemethod.ItisknowntobepresentinalargeclassofmethodsthatsatisfytheCondorcetprinciple[LN95,Tid06].Thephenomenonisknownastheno-showparadox[Mou88],becausesolverAwouldhavebeenbetterowithoutthe\support"ofthelastthreebenchmarks,onallofwhichAwastheclearwinner.TheaboveexampleispossibleunderSchulzerankingandmanyothervotingsystemsbecauseitdoesnotsatisfycriterion(1)inDenition1,thatothersolversshouldnotbeabletoaecttherelativerankingofsolversAandD. 4Theexamplecitedhassomepairwisetiesamongcandidates,forsimplicityofpresen-tation,butthesetiescanberemovedby\fuzzing"withoutchangingtheoutcomes. 3WhatisaTie?Saywearecomparingsolvers(R;S)onasetofbenchmarks,fBiji=1;:::;ng,withtimelimit.Thedataistwolistsofnumbers,ti(R)andti(S),thesolutiontimesofthetwosolversonBi.Numbersare\roatingpointandincludeInftodenoteafailureofanykind.(Wechoosenottodistinguishamongfailurereasons,exceptthatawronganswermeansthesolverisdisqualiedanditsmatchesarenotscored.)AlldataotherthanInfisbetween0and,andwecallthesenitetimes.Weinterprettheliststi(R)andti(S)asaseriesofnmini-matches,eachwithastakeofonepoint.Atieawards0toeachsolver.AwinforRgivesRascoreof1andgivesSascoreof1,andthereverseifSwins.Clearlyifti(R)=ti(S),theresultisatie.Thequestioniswhatotherout-comesshouldbeconsideredatie.Thecurrentmethodtreatsanypairofnitetimesasatie;theonlywinisanitetimevs.Inf.Theoppositeextremeisthatanyti(R)ti(S)isawinforR.Nikolicperformedatheoreticalanalysisthatdependedonacompleteabsenceofties,sohe\discarded"benchmarkswherealltimeswereunder5seconds,whichgotridofalltheexactequalitieswithnitetimes,andthentreatedanynitetimedierenceasawin[Nik10].Thisisessentiallythesameassayinganypairoftimesunder5secondsisatie.Ourthesisisthatsometimedierencesshouldbeconsidered\inconsequen-tial"inthesensethatsomeonetryingtoselectthebettersolverbetweenRandSforuseinanapplicationwouldnotbein\ruencedthesetimedierences.Wehypothesizethatonlongerruns,largertimedierenceswouldbeconsideredin-consequential,sowewanttodeneatiezonewhosewidthgrowsasruntimesgetlonger.Wealsobelievethatmostpeopleagreethatbelowsomethreshold,alltimedierencesareinconsequential.Theuserdecideswheretosetthisthresh-old,whichwecallnoise,andwhichistheonlyuser-speciedparameterneededtospecifythetiezone.Thegrowthratewechooseisfoundedinrecurrent-eventtheory.Wemodelthesolver'scomputationasalongseriesofsearcheventswithindependentoutcomes.Theprobabilitythatasearcheventhasasuccessfuloutcomeisverysmall,andthesolverterminatesupontherstsuccessfulsearchevent.Thisisthewell-knownPoissonprocess.Thestandarddeviationofthetimetoterminationisproportionaltothesquarerootoftheaveragetimetotermination.Iftwosolvershavethesame(theoretical)averagetimetoterminationonbenchmarkBi,thentheirtimedierenceisarandomvariablewithmeanzeroandstandarddeviationproportionaltothesquarerootoftheircommonaverage.Weproposethatobservedtimedierenceslessthansomenumberofstandarddeviationsshouldbeconsideredasties,becausetheydonotprovidecompellingevidencethatonesolver'saverageisreallyshorterthantheother's.Thisispurelyanheuristicmodel,ofcourse.Onceweaccepttheideathatitissensibleforthetiezonetogrowproportion-allyto (ti(R)+ti(S))=2,thesquarerootoftheaverageofthetwoobservedsolvingtimes,allthatremainsistochooseaconstantofproportionality.Theusermakesthischoiceindirectlybyspecifyingascalarparametercallednoise. 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 Faster, Average, Slower Times Tie Zone Average Solver Time Fig.1.TieZonefornoise=1minute;timesinminutes.Lowercurve:fastersolvertime;uppercurve:slowersolvertime;middleline:averageofupperandlowercurves.Asstatedabove,theintuitionisthatallsolutiontimesatorbelowthenoiselevelshouldbetreatedasindistinguishable.Figure1showsthetiezonefornoise=oneminute.Anypairoftimes,bothunderoneminute,fallintothetiezone.Togeneratethedesiredtiezone,wedene: noise=2 (ti(R)+ti(S))=2:Thenthe\tiezone"extendsfrom(ti(R)+ti(S))=2to(ti(R)+ti(S))=2+.ForRtowinitisnecessarythatti(R)(ti(R)+ti(S))=2Sinceti(R)0,Sisassuredof(atleast)atiewheneverti(S)noise,aswasdesiredbytheuser.4PairwiseMatchesSaywearecomparingsolvers(R;S)onasetofbenchmarks,fBiji=1;:::;ng,withtimelimit.Thedataistwolistsofnumbers,ti(R)andti(S).AsdescribedinSection3,weinterprettheliststi(R)andti(S)asaseriesofnmini-matches,witheachoutcomeforRbeing1,0,or1.The(algebraic)totalistherawscoreforthematch,denotedraw(R;S).Forsimplicity,allpairsareprocessed,soSiscomparedwithRatsomepointtogetraw(S;R),whichofcourseequalsraw(R;S). Thevalueofraw(R;S)canbeusedtotestthenullhypothesis,whichisthatRandShaveanequalprobabilityofwinningarandommini-match.Wealsoneedthenumberofdecisive(non-tie)mini-matches,denoteddecisive(R;S).ThentheStudenttparameterisgivenbyStudenttraw(R;S) p decisive(R;S);(1)whichexpressestherawscoreinstandarddeviations.Statistically,thematchismodeledasdecisive(R;S)faircoin\rips.Ifdecisive(R;S)islarge,thedistributionisclosetoGaussian.Wearecertainlyjustiedinrejectingthenullhypothesiswhenjtj2,withoutguringouttheexactvalueofp,whichistheprobabilityofobservingat-valuethislargeorlarger.Inthiscasep0:03.Tosummarize,ifStudentt2in(1),wemayconcludethatRisfasterthanSwithhighcondence,onaspaceofbenchmarksforwhichtheactualbenchmarksusedarerepresentative.IfStudentt=1wemayhave\medium"condence,becauseavaluethislargeorlargerwouldoccurwithprobabilityabout0:16ifthesolverswerereallyequallyfastonaverage.5CompetitionRankingWeproposetocreateak-waycompetitionrankingofsolversS1,:::,SkbyformingakkmatrixMinwhichMi;j1ifraw(Si;Sj)00:5ifraw(Si;Sj)=00ifraw(Si;Sj)0(2)Thismatrixcanbeinterpretedasspecifyingadirectedgraph(alsocalledM),wheresolversareverticesandanedgefromSitoSjexistswhereverMi;j=0.IfMi;jMj;i=0:5,thereareedgesinbothdirections.Ifthisgraphisacyclic,itdenesatotalorderamongthesolvers,whichwecallthedominanceorder.Inpractice,weusuallyarenotconcernedaboutestablishingatotalorderamongallparticipants;itissucientifthereisatotalorderamongtheleaders,perhapsthe5{6topranks.First,letusfocusonthecasethattheleadersdonothaveanytiedmatches,notevenwithanon-leader.Thingsareslightlymorecomplicatedotherwise.Inthecaseofnoleaderties,addinguptherowsoftheleadersprovidesa\denitive"ranking.Thatis,ifMrestrictedtotheleadersdenesanacyclicgraph,eachrowsumisunique,andrankingtheleadersbyrowsumsisunambiguous.However,itispossible,eveninthecaseofnoleaderties,thatthegraphhasacycle[Nik10].Thispossibilityispresentbecausetiesarenottransitiveinmini-matches.Inotherwords,onaspecicbenchmarkBi,itispossiblethatS1tieswithS2andS2tieswithS3,butS1winsorlosesagainstS3.Wecaneasilycreateasetoftimingsonthreebenchmarkssothatraw(S1;S2)=1,raw(S2;S3)=1,andraw(S3;S1)=1.Longerandmorecomplexcyclicstructures canbeconstructed,aswell.Ifacyclicstructureispresent,thenatleasttworowsumsmustbeequal(stillinthecaseofnoleadertiesinmatches).Theconclusionfromtheprecedingdiscussionisthatrowsumscanprovidequickhints,areeasilyinterpreted,butmaybeinconclusive.Ifusedcarelesslyinthepresenceofties,theycanbemisleading.Onthepositiveside,weexpectthemtobeadequateformostsituations.But\most"isnotgoodenough,soweneedaprocedurethatalwaysgivesanunambiguousresult.Example2Thissmallexampleshowssomecomplicationsthatcanarise,in-volvingpairwisetiesandcycles.Letusassumethatthetime-outis15andthatadierenceof3isawinningmarginintherangeoftimesshownbelow,whileadierenceof2isatie.Theleftsideshowstimesforthreesolversonthreebench-marks.Themiddleshowstherawscores.Therightsideshowsthedominancematrix. S1S2S3 B1 101314B1 141210B3 121114 S1S2S3 S1 010S2 101S3 010 S1S2S3 S1 010:5S2 001S3 0:500AlthoughS1beatsS2andS2beatsS3,stillS3tiesS1,soallthreearecyclicallyrelated.However,norow-sumsareequal. TreatingMsimplyasaconnected,directedgraph,itsvertices(thesolvers)canbepartitionedintostronglyconnectedcomponents.(Forsmallgraphs,thefa-mouslinear-timeprocedureisunnecessary;matrixmultiplicationsandadditionssuce.)Thecomponentgraph,obtainedbycollapsingeverystronglyconnectedcomponenttoasinglenode,denesatotalorder.Weproposethatallsolverslivinginthesamestronglyconnectedcompo-nent(SCC)ofthegraphMdescribedin(2)shallbeequallyranked;otherwisetherelativerankingisdeterminedbythecomponentgraph.Thispolicyprovidesanunambiguousspecicationforallsituations.Ifatie-breakisnecessary(e.g.,anindivisibletrophyisawarded),werecom-mendthatallsolversinasingleSCCshallberankedamongthemselvesbythesumsoftheirrawscoreswithintheSCC.Thatis,ifS1,:::,SkcompriseanSCC,thenTieBreak(Si)=kXj=1raw(Si;Sj)Thisamountstotreatingeachmini-matchamongS1,:::,Skasasingle-pointcontestbetweentwosolversinaround-robineventsimilartoteamsinaleagueplayingaseason,sowecallthistheround-robintie-breakmethod.ItisalsoknownasCopeland'smethodinthevoting-systemliterature[PBR00].Thead-vantageofthismethodisthatitiseasilyunderstoodandfamiliar.Itsdisad-vantageisthatthecomparativerankingofS1andS2dependsonmini-matchesinvolvingothersolversintheSCC. Table1.Thedominancematrixfor16solversinthenalphase,basedoncarefulranking.12345678910111213141516 1CircUs00000001111100102LySAT i10111011111101103MXC10000011111101104ManySAT 1.110100011111101105MiniSAT 09z10110011111101106MiniSat 2.111111011111101107Rsat10000001111101108SAT07 Rsat00000000101100009SAT07 picosat000000000000000010SATzilla000000011011001011SApperloT000000001000000012clasp000000001010000013glucose111111111111011014kw100000011111000015minisat cumr000000011011010016precosat1111111111111110Inpractice,weexpectSCCstobeabout2{4solvers.Outcomesperceivedasbeing\unfair"seemunlikely,becauseallthesolversinvolvedarepeers.InExample2,theround-robintie-breakmakesS1S2S3.Noticethattweakingthetimesby0:1doesnotchangetheresult,usingthismethod.However,withthesolution-countmethod,S1andS2aretiedwiththetimesasshown,buttweakingcanmakeeitheronethewinner.6ResultsontheSAT2009CompetitionThenalroundoftheApplicationsectionintheSAT2009Competition5wasconductedwithatimelimitof10000seconds,used292benchmarks,andin-volved16solvers.TheorganizerswereDanielLeBerre,LaurentSimon,andOlivierRoussel.Thediscussionusesabbreviatedsolvernames;pleaseseethewebpageforcompletenames.Thesolverswererankedforthecompetitionus-ingthesolution-countrankingmethoddescribedinSection1.Wecomputedtherankingsthatwouldhaveresultedusingcarefulranking.ThedominancematrixdiscussedinSection5isshowninTable1.Examinationofthismatrixshowsthatsolvers1,10,14,and15areinonestronglyconnectedcomponent,sotheyshareranks9{12,accordingtoSection5.Allothersolversarenotinanycycles,sohaveuniqueranks. 5See. Table2.Numbersofchangesintopthreeranksfortworankingmethodsandvarioustimelimits(seconds). timerangesolution-countcarefulrank 1600{2000842000{40001004000{6000406000{8000008000{1000010 6.1RobustnessofRankingWeanalyzedtherobustnessoftherankingmethodsbycountinghowmanytimestherewassomechangeinthetopthreeranksasthetimelimitwasvariedcontinuously.Wenotethatprecosatstayedinrstplaceforalltimelimits4000secondsandabove,inbothrankingmethods.Table2summarizesthenumbersofchangesinvariousranges.(Returningtoanearlierpermutationisconsideredachange,too.)Itseemsclearthatcarefulrankingismorerobustbythiscriterion.Weoerthisintuitiveexplanationforwhycarefulrankinggiveslessvari-ationsasthetimelimitchanges.Withsolution-countranking,amini-matchvictoryisonlytemporary,asthetimelimitincreases:S1winsthemini-matchagainstS2onlyifS1succeedsandS2timesout.ButforahighenoughtimelimitS2alsosucceeds(intheory),andthevictoryiswipedout.However,withcarefulranking,oncethetimelimitissucientlyabovethesolvingtimeofS1andS2stillhasnotsucceeded,thevictoryispermanentforthismini-match.6.2DierencesinRankingThetworankingmethods,carefulrankingandsolution-countranking,disagreedonthethirdplacesolverwiththenaltimelimitof10,000seconds.MiniSat 2.1heldthirdplacebehindglucoseforalltimelimitsabove2000undercarefulranking.Underthesolution-countranking,MiniSat 2.1andLySAT iexchangedplacestwoandthreeseveraltimes,withLySAT inallytakingtheleadafterabout8100.Bythe10,000markMiniSat 2.1wasinsixthplace.Underthesolution-countranking,precosatandglucosewereappar-ently\neckandneck,"astheyeachsolved204instances.Thetie-breakwasoncumulativeCPUtime,andprecosatwon.Othersolverswereinthe190'swellseparatedfromthetwoleaders.Quiteadierentpictureemergesundercarefulranking.Weshowthreematcheswiththeirstatistics.\Std.Devs."referstotheStudenttfrom(1). WinnerLoserRawScoreStd.Devs.Prob.Faster precosatglucose161.650.97glucoseMiniSat 2.180.830.79MiniSat 2.1LySAT i80.860.80 Inthisranking,precosathasamoreconvincingwinthananyoftheothers.6.3Tie-BreakIllustrationRecallthatTable1showsthatsolvers1,10,14,and15areinonestronglyconnectedcomponent,sharingranks9{12,Forpurposesofillustration,weapplytheround-robintie-breakproceduredescribedinSection5tothesefoursolvers,althoughinpracticeitisprobablynotimportanttobreakthistie.Theleftsidejustbelowshowstherawscoresforsolversinvolved.Therightsideshowsthedominancesubgraph. S1S10S14S15 S1 01391S10 13083S14 9801S15 1310S1S10S14S15 -@@@@R ? Theround-robinranking,basedontherow-sums,givesS14S10S15S1.7ConclusionThispaperdescribedanewrankingsystemthatprovidesameasureofstatis-ticalsignicance,allowsforsmalltimingdierencestobetreatedasties,andensuresthatapairwisecomparisonbetweentwosolversisnotin\ruencedbyathirdsolver.Thelatterpropertyalsoallowslaterresearcherstoreplicatethecompetitionconditionsandndoutwheretheirsolverwouldhaveranked.An-otherapplicationofthistechniqueistoevaluatewhethersoftwarechangesfromoneversiontoanothercausedaperformancedierencethatisstatisticallysig-nicant,orwhetherthedierenceisinarangethatmightwelljustberandom.AcknowledgmentWethankDanielLeBerre,LaurentSimon,andOlivierRousselformakingthe2009competitiondataavailableintextform.References[BLS05]F.Brglez,X.Y.Li,andM.F.Stallmann.OnSATinstanceclassesandamethodforreliableperformanceexperimentswithsatsolvers.AnnalsofMathematicsandArticialIntelligence,43:1{34,2005.[BO07]F.BrglezandJ.A.Osborne.Performancetestingofcombinatorialsolverswithisomorphclassinstances.InWorkshoponExperimentalComputerScience,SanDiego,2007.ACM.(co-locatedwithFCRC2007).[LBS04]D.LeBerreandL.Simon.Theessentialsofthesat2003competition.InProc.SAT,2004. [LN95]J.LevinandB.Nalebu.Anintroductiontovote-countingschemes.TheJournalofEconomicPerspectives,9:3{26,1995.[Mou88]H.Moulin.Condorcet'sprincipleimpliestheno-showparadox.TheJournalofEconomicTheory,45:53{64,1988.[Nik10]M.Nikolic.StatisticalmethodologyforcomparisonofSATsolvers.InProc.SAT,pages158{171,2010.[PBR00]J.-C.PomerolandS.Barba-Romero.MulticriterionDecisioninManage-ment:PrinciplesandPractice.Springer,2000.[Pul06]L.Pulina.Empiricalevaluationofscoringmethods.InThirdEuropeanStart-ingAIResearcherSymposium,2006.[Sch03]M.Schulze.Anewmonotonicandclone-independentsingle-winnerelectionmethod.InN.Tideman,editor,VotingMatters,Issue17,pages9{19.Oct.2003.URL:.[Tid06]N.Tideman.CollectiveDecisionsandVoting:thePotentialforPublicChoice.Ashgate,2006.