/
Careful Ranking of Multiple Solvers with Timeouts and Ties Allen Van Gelder httpwww Careful Ranking of Multiple Solvers with Timeouts and Ties Allen Van Gelder httpwww

Careful Ranking of Multiple Solvers with Timeouts and Ties Allen Van Gelder httpwww - PDF document

tatiana-dople
tatiana-dople . @tatiana-dople
Follow
421 views
Uploaded On 2014-10-28

Careful Ranking of Multiple Solvers with Timeouts and Ties Allen Van Gelder httpwww - PPT Presentation

cseucscedu avg University of California Santa Cruz CA 95064 Abstract In several 64257elds Satis64257ability being one there are regular competitions to compare multiple solvers in a common setting Due to the fact some benchmarks of interest are too d ID: 8353

cseucscedu avg University California

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Careful Ranking of Multiple Solvers with..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

CarefulRankingofMultipleSolverswithTimeoutsandTiesAllenVanGelderhttp://www.cse.ucsc.edu/avgUniversityofCalifornia,SantaCruz,CA95064Abstract.Inseveral elds,Satis abilitybeingone,thereareregularcompetitionstocomparemultiplesolversinacommonsetting.Duetothefactsomebenchmarksofinterestaretoodicultforallsolverstocompletewithinavailabletime,time-outsoccurandmustbeconsidered.Throughsomestrangeevolution,time-outsbecametheonlyfactorthatwasconsideredinevaluation.PreviousworkinSAT2010observedthatthisevaluationmethodisunreliableandlacksawaytoattachstatisticalsigni cancetoitsconclusions.However,theproposedalternativewasquitecomplicatedandisunlikelytoseegeneraluse.Thispaperdescribesasimplersystem,calledcarefulranking,thatper-mitsameasureofstatisticalsigni cance,andstillmeetsmanyofthepracticalrequirementsofanevaluationsystem.Itincorporatesoneofthemainideasofthepreviouswork:thatoutcomeshadtobefreedofassumptionsabouttimingdistributions,sothatnon-parametricmethodswerenecessary.Unlikethepreviouswork,itincorporatesties.Thecarefulrankingsystemhasseveralimportantnon-mathematicalprop-ertiesthataredesiredinanevaluationsystem:(1)therelativerankingoftwosolverscannotbein\ruencedbyathirdsolver;(2)afterthecompeti-tionresultsarepublished,aresearchercanrunanewsolveronthesamebenchmarksanddeterminewherethenewsolverwouldhaveranked;(3)smalltimingdi erencescanbeignored;(4)thecomputationsshouldbeeasytounderstandandreproduce.Votingsystemsproposedintheliteraturelacksomeoralloftheseproperties.Apropertyofcarefulrankingisthatthepairwiserankingmightcontaincycles.Whetherthisisabugorafeatureisamatterofopinion.Whetheritoccursamongleadersinpracticeisamatterofexperience.ThesystemisimplementedandhasbeenappliedtotheSAT2009Com-petition.Nocyclesoccurredamongtheleaders,buttherewasacycleamongsomelow-rankingsolvers.Tomeasurerobustness,thenewandcurrentsystemswerecomputedwitharangeofsimulatedtime-outs,toseehowoftenthetoprankingschanged.Thatis,timesabovethesim-ulatedtime-outarereclassi edastime-outsandtherankingsarecom-putedwiththisdata.Carefulrankingexhibitedmanyfewerchanges.1IntroductionandOverviewEmpiricalcomparisonofcomputationalperformanceisanimportanttechniqforadvancingthestateoftheartinsoftware.InPropositionalSatis ability andseveralrelated eldstestingprogramsonbenchmarksiscomplicatedbythefactthattimelimitsmustbeset,becausesomebenchmarksofinterestartoodicultforallprogramstocompletewithinavailabletime.Programscanfailtocompleteatestduetoexhaustingtimeorsomeotherresource,oftenmemory.Thereisnoclearlycorrectwaytointegratetheresultsoffailedtestsandcompletedtests,tocomputeasingle\ gureofmerit."Tofocusthediscussion,letusassumethatthepropertywewishtomeasureisspeedofsolution,andweareevaluatingtheresultsofaSATcompetition.Ifeveryprogramcouldcompleteeverytest,wewouldsimplyaddupthetimesforeachprogramandrankthemaccordingtothistotal,withsmallestbeingbest.Fromthispointofview,time-outsandotherfailuresaredefectsintheexperiment.Inactuality,thereisnottimeforeveryprogramtocompleteeverytest,andfailuresdooccur.Thisleadstowhatiscalledcensoreddataintheliterature:there\really"isavalueforthetimetheprogramwouldhavetakenonabenchmark,wejustdidnot ndoutwhatthatvalueis.Thequestionis,whatisagoodwaytoranktheprograms,basedonthedatathatisavailable.Logically,wewouldwantthisrankingmethodtoproducethesameresultsastheidealexperiment,totheextentpossible.TherankingmethodthathasbeenusedinrecentSATcompetitions,whichweshallcallsolution-countranking,1istosetsometimelimitadhoc,andsimplycounthowmanytestsaresuccessfullycompleted.Throughsomestrangeevolution,time-outs,whicharethemanifestationsofdefectsintheexperiment,becametheonlyfactorthatwasconsideredinevaluation.TotalCPUtimeisusedasatie-breakeronlyifsolutioncountsareequal.PreviousworkbyNikolicobservedthatsolution-countrankingisunreli-ableandlacksawaytoattachstatisticalsigni cancetoitsconclusions[Nik10].However,theproposedalternativewasquitecomplicatedandhadsomepracti-caldrawbacks.ThepurposeofthispaperistodescribeandproposeasimplersystemthatmeetsthepracticalrequirementsforrankingsolversinaSATcom-petition(endorsedbyasurveyofsolverdevelopersandusers),1andalsogivesinformationaboutthestatisticalsigni canceoftheresults,orlackthereof.De nition1Practicalrequirements:1.Therelativerankingoftwosolverscannotbein\ruencedbyathirdsolver.2.Afterthecompetitionresultsarepublished,aresearchercanrunanewsolveronthesamebenchmarksanddeterminewherethenewsolverwouldhaveranked.3.Smalltimingdi erencescanbeignored.4.Thecomputationsshouldbeeasytounderstandandreproduce.Oneearliermethod,calledthepursemethod,2lackedproperties(1)and(2)andfellintodisfavorafterafewtrials. 1See,whereitiscalled\Lex-icographicalNBSOLVED,sumti."2See. Themethodologywepropose,calledcarefulranking,incorporatesoneofthemainideasofNikolic:thatoutcomesmustbefreeofassumptionsabouttimingdistributions,becausewehavenoinformationaboutthesedistributions.Non-parametricmethodsarenecessary.Unlikethepreviouswork,ourproposalincorporatestiestoaccountfortimingdi erencesthatareconsideredinconse-quentialforrankingpurposes.Thecarefulrankingsystemhastheimportantnon-mathematicalproper-tiesgiveninDe nition1.Themainingredientofcarefulrankingisthatallpairsofcompetitorsarecomparedinisolation,leadingtoapairof\scores"thatsumtozero.Alargepositivescoreindicatesasigni cantlyfastersolver.Thenullhypothesisisthatbothsolversareequallyfast\overall,"or\inthelongrun."Theexpectedvalueofthescoreiszero,underthishypothesis.Thedi erencebetweenzeroandtheobservedscoremaybeconvertedintoastandardmeasureofstatisticalsigni cance.Forak-waycompetition,therearek(k1)=2pairwisematches.Theresultsareexpressedwithadominancematrix,asdescribedinSection5.The nalrankingisextractedfromthismatrix.Thereisameta-rankingquestiontobeaddressed.Howcanwecomparevariousrankingmethods,sincewedonotknowthe\trueanswers?"Themethodwepropose,anduse,istomeasuresensitivitytochangesinthetimelimit.Wdonotknowwhatwouldhavehappenedifweusedalargertimelimit.Butwhatwouldhavehappenedunderallshortertimelimitscanbedeterminedfromtheavailabledata.Thecarefulrankingsystemisimplemented3andhasbeenappliedtotheSAT2009Competition.Theimplementationiscshscripts,sed,andawk,whichshouldbeportable.Nocyclesoccurredamongtheleaders,buttherewasacycleamongsomelow-rankingsolvers.Tomeasurerobustness,thenewandcurrentsystemswerecomputedwitharangeofsimulatedtime-outs,toseehowoftenthetoprankingschanged.Thatis,timesabovethesimulatedtime-outarereclassiastime-outsandtherankingsarecomputedwiththisdata.2RelatedWorkThereisalargebodyofworkonvariousaspectsofexperimentalcomparisons.Werestrictourselvestoimmediatelyrelatedworkonrankingsolvers.Non-mathematicalconsiderationsforascoringmethodarediscussedingeneraltermsbyLeBerreandSimon[LBS04],andin\ruencedseveralaspectsofthemethodproposedhere.Onesuchaspectisourprovisionformanytimingdi erencestobetreatedasatie,becauseitappearsthatmanypeopleconsidercallingonesolverthewinnerinthesecasesisadistortion.Thereactiontothisperceiveddistortionhasbeentoreducetheimportanceofspeedtonearlynothing,aslongasthesolverstayswithinthetimelimit.Wehopethattreating\minor"timingdi erencesastieswillmakethetechniquemoreacceptablethanpriortechniquesthatusedtimeasthemajorconsideration. 3Codeisat. Brglezandco-authors[BLS05,BO07]replicateinstancesintoclassestogatherstatistics.Theirgoalsarequitedi erentfromrankingacompetition.Nikolic[Nik10]extendstheseideastocomparemorethantwoprograms.Thenon-mathematical,practicalissuesmentionedinDe nition1arenotconsideredinthesepapers.Pulinaconductedanextensiveempiricalevaluationofseveralscoringmeth-ods[Pul06].Onecriterionheusedissimilartotheoneweuse,decreasingthetimelimitandmeasuringstability.Ourproposedmethodissigni cantlydi erentfromthoseheanalyzed.Mostorallofthecomparisonmethodshestudiedlackedtheindependencefromathirdsolver.Thusalaterresearchercouldnotseewherenewwork tintoapreviouscompetition.Pulinaintroducedtheideaofviewingtherankingproblemasavotingsitua-tion:eachbenchmark\votes"forthesolvers(the\candidates")byapreferenceballotthatranksthembysolutiontime.Thisisaveryattractiveidea,butun-fortunately,noneofthewell-knownproposedvotingmethodssatisfythecriteriaofDe nition1andelaboratedfurtherintheURLgiventhere.Thereisavastliteratureonthissubject,assurveyedbyLevinandNalebu [LN95],andmorerecentlytreatedbyPomerolandBarba-Romero[PBR00]andTideman[Tid06].Adetailedcomparisonwithallproposalswouldtakeusfara eld,sowerestrictattentiontotheSchulzemethod,whichhasenjoyedrecentpopularity[Sch03].Thatpopularityisnotsurprising,becausetheSchulzemethod,unlikemanyotherproposals(suchasBorda),permitsvoterstovoteequalpreferencesamongsubsetsofthecandidates(e.g.,D=1,(A;C)=2,B=7isavalidballotina eldof10).Supposeacompetitionisbeingrunwithsixsolversand63benchmarkswithSchulzeranking(theexamplemayusemanycombinationsofnumbers),andthefollowingeventstranspire.After60benchmarkshavebeenrunonallsolvers,theSchulzerankingiscomputedandsolverAisuniquelywinning.Oneachofthelastthreebenchmarks,solverAhasthebestperformanceofanysolverandsolverDhastheworstperformanceofanysolver.(Forexample,solverDmighttimeoutonthelastthreebenchmarks).However,whentheSchulzerankingiscomputedusingall63benchmarks,Dwins.No,thisisnotatypo.SeeAppendixBof[Sch03]forcompletedetails.4ItisimpossibletoimaginethatanyorganizersofacompetitionwouldadopttheSchulzemethod,iftheyknowaboutthispossibility.Moreover,thisisnotaquirkintheSchulzemethod.ItisknowntobepresentinalargeclassofmethodsthatsatisfytheCondorcetprinciple[LN95,Tid06].Thephenomenonisknownastheno-showparadox[Mou88],becausesolverAwouldhavebeenbettero withoutthe\support"ofthelastthreebenchmarks,onallofwhichAwastheclearwinner.TheaboveexampleispossibleunderSchulzerankingandmanyothervotingsystemsbecauseitdoesnotsatisfycriterion(1)inDe nition1,thatothersolversshouldnotbeabletoa ecttherelativerankingofsolversAandD. 4Theexamplecitedhassomepairwisetiesamongcandidates,forsimplicityofpresen-tation,butthesetiescanberemovedby\fuzzing"withoutchangingtheoutcomes. 3WhatisaTie?Saywearecomparingsolvers(R;S)onasetofbenchmarks,fBiji=1;:::;ng,withtimelimit.Thedataistwolistsofnumbers,ti(R)andti(S),thesolutiontimesofthetwosolversonBi.Numbersare\roatingpointandincludeInftodenoteafailureofanykind.(Wechoosenottodistinguishamongfailurereasons,exceptthatawronganswermeansthesolverisdisquali edanditsmatchesarenotscored.)AlldataotherthanInfisbetween0and,andwecallthese nitetimes.Weinterprettheliststi(R)andti(S)asaseriesofnmini-matches,eachwithastakeofonepoint.Atieawards0toeachsolver.AwinforRgivesRascoreof1andgivesSascoreof1,andthereverseifSwins.Clearlyifti(R)=ti(S),theresultisatie.Thequestioniswhatotherout-comesshouldbeconsideredatie.Thecurrentmethodtreatsanypairof nitetimesasatie;theonlywinisa nitetimevs.Inf.Theoppositeextremeisthatanyti(R)ti(S)isawinforR.Nikolicperformedatheoreticalanalysisthatdependedonacompleteabsenceofties,sohe\discarded"benchmarkswherealltimeswereunder5seconds,whichgotridofalltheexactequalitieswith nitetimes,andthentreatedany nitetimedi erenceasawin[Nik10].Thisisessentiallythesameassayinganypairoftimesunder5secondsisatie.Ourthesisisthatsometimedi erencesshouldbeconsidered\inconsequen-tial"inthesensethatsomeonetryingtoselectthebettersolverbetweenRandSforuseinanapplicationwouldnotbein\ruencedthesetimedi erences.Wehypothesizethatonlongerruns,largertimedi erenceswouldbeconsideredin-consequential,sowewanttode neatiezonewhosewidthgrowsasruntimesgetlonger.Wealsobelievethatmostpeopleagreethatbelowsomethreshold,alltimedi erencesareinconsequential.Theuserdecideswheretosetthisthresh-old,whichwecallnoise,andwhichistheonlyuser-speci edparameterneededtospecifythetiezone.Thegrowthratewechooseisfoundedinrecurrent-eventtheory.Wemodelthesolver'scomputationasalongseriesofsearcheventswithindependentoutcomes.Theprobabilitythatasearcheventhasasuccessfuloutcomeisverysmall,andthesolverterminatesuponthe rstsuccessfulsearchevent.Thisisthewell-knownPoissonprocess.Thestandarddeviationofthetimetoterminationisproportionaltothesquarerootoftheaveragetimetotermination.Iftwosolvershavethesame(theoretical)averagetimetoterminationonbenchmarkBi,thentheirtimedi erenceisarandomvariablewithmeanzeroandstandarddeviationproportionaltothesquarerootoftheircommonaverage.Weproposethatobservedtimedi erenceslessthansomenumberofstandarddeviationsshouldbeconsideredasties,becausetheydonotprovidecompellingevidencethatonesolver'saverageisreallyshorterthantheother's.Thisispurelyanheuristicmodel,ofcourse.Onceweaccepttheideathatitissensibleforthetiezonetogrowproportion-allyto (ti(R)+ti(S))=2,thesquarerootoftheaverageofthetwoobservedsolvingtimes,allthatremainsistochooseaconstantofproportionality.Theusermakesthischoiceindirectlybyspecifyingascalarparametercallednoise. 0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 Faster, Average, Slower Times Tie Zone Average Solver Time Fig.1.TieZonefornoise=1minute;timesinminutes.Lowercurve:fastersolvertime;uppercurve:slowersolvertime;middleline:averageofupperandlowercurves.Asstatedabove,theintuitionisthatallsolutiontimesatorbelowthenoiselevelshouldbetreatedasindistinguishable.Figure1showsthetiezonefornoise=oneminute.Anypairoftimes,bothunderoneminute,fallintothetiezone.Togeneratethedesiredtiezone,wede ne: noise=2 (ti(R)+ti(S))=2:Thenthe\tiezone"extendsfrom(ti(R)+ti(S))=2to(ti(R)+ti(S))=2+.ForRtowinitisnecessarythatti(R)(ti(R)+ti(S))=2Sinceti(R)0,Sisassuredof(atleast)atiewheneverti(S)noise,aswasdesiredbytheuser.4PairwiseMatchesSaywearecomparingsolvers(R;S)onasetofbenchmarks,fBiji=1;:::;ng,withtimelimit.Thedataistwolistsofnumbers,ti(R)andti(S).AsdescribedinSection3,weinterprettheliststi(R)andti(S)asaseriesofnmini-matches,witheachoutcomeforRbeing1,0,or1.The(algebraic)totalistherawscoreforthematch,denotedraw(R;S).Forsimplicity,allpairsareprocessed,soSiscomparedwithRatsomepointtogetraw(S;R),whichofcourseequalsraw(R;S). Thevalueofraw(R;S)canbeusedtotestthenullhypothesis,whichisthatRandShaveanequalprobabilityofwinningarandommini-match.Wealsoneedthenumberofdecisive(non-tie)mini-matches,denoteddecisive(R;S).ThentheStudenttparameterisgivenbyStudenttraw(R;S) p decisive(R;S);(1)whichexpressestherawscoreinstandarddeviations.Statistically,thematchismodeledasdecisive(R;S)faircoin\rips.Ifdecisive(R;S)islarge,thedistributionisclosetoGaussian.Wearecertainlyjusti edinrejectingthenullhypothesiswhenjtj2,without guringouttheexactvalueofp,whichistheprobabilityofobservingat-valuethislargeorlarger.Inthiscasep0:03.Tosummarize,ifStudentt2in(1),wemayconcludethatRisfasterthanSwithhighcon dence,onaspaceofbenchmarksforwhichtheactualbenchmarksusedarerepresentative.IfStudentt=1wemayhave\medium"con dence,becauseavaluethislargeorlargerwouldoccurwithprobabilityabout0:16ifthesolverswerereallyequallyfastonaverage.5CompetitionRankingWeproposetocreateak-waycompetitionrankingofsolversS1,:::,SkbyformingakkmatrixMinwhichMi;j1ifraw(Si;Sj)00:5ifraw(Si;Sj)=00ifraw(Si;Sj)0(2)Thismatrixcanbeinterpretedasspecifyingadirectedgraph(alsocalledM),wheresolversareverticesandanedgefromSitoSjexistswhereverMi;j=0.IfMi;jMj;i=0:5,thereareedgesinbothdirections.Ifthisgraphisacyclic,itde nesatotalorderamongthesolvers,whichwecallthedominanceorder.Inpractice,weusuallyarenotconcernedaboutestablishingatotalorderamongallparticipants;itissucientifthereisatotalorderamongtheleaders,perhapsthe5{6topranks.First,letusfocusonthecasethattheleadersdonothaveanytiedmatches,notevenwithanon-leader.Thingsareslightlymorecomplicatedotherwise.Inthecaseofnoleaderties,addinguptherowsoftheleadersprovidesa\de nitive"ranking.Thatis,ifMrestrictedtotheleadersde nesanacyclicgraph,eachrowsumisunique,andrankingtheleadersbyrowsumsisunambiguous.However,itispossible,eveninthecaseofnoleaderties,thatthegraphhasacycle[Nik10].Thispossibilityispresentbecausetiesarenottransitiveinmini-matches.Inotherwords,onaspeci cbenchmarkBi,itispossiblethatS1tieswithS2andS2tieswithS3,butS1winsorlosesagainstS3.Wecaneasilycreateasetoftimingsonthreebenchmarkssothatraw(S1;S2)=1,raw(S2;S3)=1,andraw(S3;S1)=1.Longerandmorecomplexcyclicstructures canbeconstructed,aswell.Ifacyclicstructureispresent,thenatleasttworowsumsmustbeequal(stillinthecaseofnoleadertiesinmatches).Theconclusionfromtheprecedingdiscussionisthatrowsumscanprovidequickhints,areeasilyinterpreted,butmaybeinconclusive.Ifusedcarelesslyinthepresenceofties,theycanbemisleading.Onthepositiveside,weexpectthemtobeadequateformostsituations.But\most"isnotgoodenough,soweneedaprocedurethatalwaysgivesanunambiguousresult.Example2Thissmallexampleshowssomecomplicationsthatcanarise,in-volvingpairwisetiesandcycles.Letusassumethatthetime-outis15andthatadi erenceof3isawinningmarginintherangeoftimesshownbelow,whileadi erenceof2isatie.Theleftsideshowstimesforthreesolversonthreebench-marks.Themiddleshowstherawscores.Therightsideshowsthedominancematrix. S1S2S3 B1 101314B1 141210B3 121114 S1S2S3 S1 010S2 101S3 010 S1S2S3 S1 010:5S2 001S3 0:500AlthoughS1beatsS2andS2beatsS3,stillS3tiesS1,soallthreearecyclicallyrelated.However,norow-sumsareequal. TreatingMsimplyasaconnected,directedgraph,itsvertices(thesolvers)canbepartitionedintostronglyconnectedcomponents.(Forsmallgraphs,thefa-mouslinear-timeprocedureisunnecessary;matrixmultiplicationsandadditionssuce.)Thecomponentgraph,obtainedbycollapsingeverystronglyconnectedcomponenttoasinglenode,de nesatotalorder.Weproposethatallsolverslivinginthesamestronglyconnectedcompo-nent(SCC)ofthegraphMdescribedin(2)shallbeequallyranked;otherwisetherelativerankingisdeterminedbythecomponentgraph.Thispolicyprovidesanunambiguousspeci cationforallsituations.Ifatie-breakisnecessary(e.g.,anindivisibletrophyisawarded),werecom-mendthatallsolversinasingleSCCshallberankedamongthemselvesbythesumsoftheirrawscoreswithintheSCC.Thatis,ifS1,:::,SkcompriseanSCC,thenTieBreak(Si)=kXj=1raw(Si;Sj)Thisamountstotreatingeachmini-matchamongS1,:::,Skasasingle-pointcontestbetweentwosolversinaround-robineventsimilartoteamsinaleagueplayingaseason,sowecallthistheround-robintie-breakmethod.ItisalsoknownasCopeland'smethodinthevoting-systemliterature[PBR00].Thead-vantageofthismethodisthatitiseasilyunderstoodandfamiliar.Itsdisad-vantageisthatthecomparativerankingofS1andS2dependsonmini-matchesinvolvingothersolversintheSCC. Table1.Thedominancematrixfor16solversinthe nalphase,basedoncarefulranking.12345678910111213141516 1CircUs00000001111100102LySAT i10111011111101103MXC10000011111101104ManySAT 1.110100011111101105MiniSAT 09z10110011111101106MiniSat 2.111111011111101107Rsat10000001111101108SAT07 Rsat00000000101100009SAT07 picosat000000000000000010SATzilla000000011011001011SApperloT000000001000000012clasp000000001010000013glucose111111111111011014kw100000011111000015minisat cumr000000011011010016precosat1111111111111110Inpractice,weexpectSCCstobeabout2{4solvers.Outcomesperceivedasbeing\unfair"seemunlikely,becauseallthesolversinvolvedarepeers.InExample2,theround-robintie-breakmakesS1�S2�S3.Noticethattweakingthetimesby0:1doesnotchangetheresult,usingthismethod.However,withthesolution-countmethod,S1andS2aretiedwiththetimesasshown,buttweakingcanmakeeitheronethewinner.6ResultsontheSAT2009CompetitionThe nalroundoftheApplicationsectionintheSAT2009Competition5wasconductedwithatimelimitof10000seconds,used292benchmarks,andin-volved16solvers.TheorganizerswereDanielLeBerre,LaurentSimon,andOlivierRoussel.Thediscussionusesabbreviatedsolvernames;pleaseseethewebpageforcompletenames.Thesolverswererankedforthecompetitionus-ingthesolution-countrankingmethoddescribedinSection1.Wecomputedtherankingsthatwouldhaveresultedusingcarefulranking.ThedominancematrixdiscussedinSection5isshowninTable1.Examinationofthismatrixshowsthatsolvers1,10,14,and15areinonestronglyconnectedcomponent,sotheyshareranks9{12,accordingtoSection5.Allothersolversarenotinanycycles,sohaveuniqueranks. 5See. Table2.Numbersofchangesintopthreeranksfortworankingmethodsandvarioustimelimits(seconds). timerangesolution-countcarefulrank 1600{2000842000{40001004000{6000406000{8000008000{1000010 6.1RobustnessofRankingWeanalyzedtherobustnessoftherankingmethodsbycountinghowmanytimestherewassomechangeinthetopthreeranksasthetimelimitwasvariedcontinuously.Wenotethatprecosatstayedin rstplaceforalltimelimits4000secondsandabove,inbothrankingmethods.Table2summarizesthenumbersofchangesinvariousranges.(Returningtoanearlierpermutationisconsideredachange,too.)Itseemsclearthatcarefulrankingismorerobustbythiscriterion.Weo erthisintuitiveexplanationforwhycarefulrankinggiveslessvari-ationsasthetimelimitchanges.Withsolution-countranking,amini-matchvictoryisonlytemporary,asthetimelimitincreases:S1winsthemini-matchagainstS2onlyifS1succeedsandS2timesout.ButforahighenoughtimelimitS2alsosucceeds(intheory),andthevictoryiswipedout.However,withcarefulranking,oncethetimelimitissucientlyabovethesolvingtimeofS1andS2stillhasnotsucceeded,thevictoryispermanentforthismini-match.6.2Di erencesinRankingThetworankingmethods,carefulrankingandsolution-countranking,disagreedonthethirdplacesolverwiththe naltimelimitof10,000seconds.MiniSat 2.1heldthirdplacebehindglucoseforalltimelimitsabove2000undercarefulranking.Underthesolution-countranking,MiniSat 2.1andLySAT iexchangedplacestwoandthreeseveraltimes,withLySAT i nallytakingtheleadafterabout8100.Bythe10,000markMiniSat 2.1wasinsixthplace.Underthesolution-countranking,precosatandglucosewereappar-ently\neckandneck,"astheyeachsolved204instances.Thetie-breakwasoncumulativeCPUtime,andprecosatwon.Othersolverswereinthe190'swellseparatedfromthetwoleaders.Quiteadi erentpictureemergesundercarefulranking.Weshowthreematcheswiththeirstatistics.\Std.Devs."referstotheStudenttfrom(1). WinnerLoserRawScoreStd.Devs.Prob.Faster precosatglucose161.650.97glucoseMiniSat 2.180.830.79MiniSat 2.1LySAT i80.860.80 Inthisranking,precosathasamoreconvincingwinthananyoftheothers.6.3Tie-BreakIllustrationRecallthatTable1showsthatsolvers1,10,14,and15areinonestronglyconnectedcomponent,sharingranks9{12,Forpurposesofillustration,weapplytheround-robintie-breakproceduredescribedinSection5tothesefoursolvers,althoughinpracticeitisprobablynotimportanttobreakthistie.Theleftsidejustbelowshowstherawscoresforsolversinvolved.Therightsideshowsthedominancesubgraph. S1S10S14S15 S1 01391S10 13083S14 9801S15 1310S1S10S14S15 -@@@@R  ? Theround-robinranking,basedontherow-sums,givesS14�S10�S15�S1.7ConclusionThispaperdescribedanewrankingsystemthatprovidesameasureofstatis-ticalsigni cance,allowsforsmalltimingdi erencestobetreatedasties,andensuresthatapairwisecomparisonbetweentwosolversisnotin\ruencedbyathirdsolver.Thelatterpropertyalsoallowslaterresearcherstoreplicatethecompetitionconditionsand ndoutwheretheirsolverwouldhaveranked.An-otherapplicationofthistechniqueistoevaluatewhethersoftwarechangesfromoneversiontoanothercausedaperformancedi erencethatisstatisticallysig-ni cant,orwhetherthedi erenceisinarangethatmightwelljustberandom.AcknowledgmentWethankDanielLeBerre,LaurentSimon,andOlivierRousselformakingthe2009competitiondataavailableintextform.References[BLS05]F.Brglez,X.Y.Li,andM.F.Stallmann.OnSATinstanceclassesandamethodforreliableperformanceexperimentswithsatsolvers.AnnalsofMathematicsandArti cialIntelligence,43:1{34,2005.[BO07]F.BrglezandJ.A.Osborne.Performancetestingofcombinatorialsolverswithisomorphclassinstances.InWorkshoponExperimentalComputerScience,SanDiego,2007.ACM.(co-locatedwithFCRC2007).[LBS04]D.LeBerreandL.Simon.Theessentialsofthesat2003competition.InProc.SAT,2004. [LN95]J.LevinandB.Nalebu .Anintroductiontovote-countingschemes.TheJournalofEconomicPerspectives,9:3{26,1995.[Mou88]H.Moulin.Condorcet'sprincipleimpliestheno-showparadox.TheJournalofEconomicTheory,45:53{64,1988.[Nik10]M.Nikolic.StatisticalmethodologyforcomparisonofSATsolvers.InProc.SAT,pages158{171,2010.[PBR00]J.-C.PomerolandS.Barba-Romero.MulticriterionDecisioninManage-ment:PrinciplesandPractice.Springer,2000.[Pul06]L.Pulina.Empiricalevaluationofscoringmethods.InThirdEuropeanStart-ingAIResearcherSymposium,2006.[Sch03]M.Schulze.Anewmonotonicandclone-independentsingle-winnerelectionmethod.InN.Tideman,editor,VotingMatters,Issue17,pages9{19.Oct.2003.URL:.[Tid06]N.Tideman.CollectiveDecisionsandVoting:thePotentialforPublicChoice.Ashgate,2006.