/
AGeneticAlgorithmforDetectingSignicantFloating-PointInaccuraciesDamin AGeneticAlgorithmforDetectingSignicantFloating-PointInaccuraciesDamin

AGeneticAlgorithmforDetectingSignicantFloating-PointInaccuraciesDamin - PDF document

alexa-scheidler
alexa-scheidler . @alexa-scheidler
Follow
350 views
Uploaded On 2016-04-27

AGeneticAlgorithmforDetectingSignicantFloating-PointInaccuraciesDamin - PPT Presentation

TABLEIIEEE754oatingpointrepresentation Sign Exponent Signicand SinglePrecision 1 8 23 DoublePrecision 1 11 52 oatingpointnumbersthatmayaffecttheerroroftheoutputthescalesofexponentandthebitforma ID: 295115

TABLEI:IEEE754oating-pointrepresentation Sign Exponent Signicand SinglePrecision 1 8 23 DoublePrecision 1 11 52 oating-pointnumbersthatmayaffecttheerroroftheoutput:thescalesofexponentandthebitforma

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "AGeneticAlgorithmforDetectingSignicantF..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

AGeneticAlgorithmforDetectingSignicantFloating-PointInaccuraciesDamingZouy,RanWangy,YingfeiXiongy,LuZhangy,ZhendongSuz,HongMeiyKeyLaboratoryofHighCondenceSoftwareTechnologies(PekingUniversity),MoE,ChinayInstituteofSoftware,SchoolofEECS,PekingUniversity,ChinazDepartmentofComputerScience,UniversityofCalifornia,Davis,USAfzoudm,lilianwangran,xiongyf,zhanglucs,meihg@pku.edu.cn,su@ucdavis.eduAbstract—Itiswell-knownthatusingoating-pointnumbersmayinevitablyresultininaccurateresultsandsometimesevencauseserioussoftwarefailures.Safety-criticalsoftwareoftenhasstrictrequirementsontheupperboundofinaccuracy,andacrucialtaskintestingistocheckwhethersignicantinaccuraciesmaybeproduced.Themainexistingapproachtotheoating-pointinaccuracyproblemiserroranalysis,whichproducesanupperboundofinaccuraciesthatmayoccur.However,ahighupperbounddoesnotguaranteetheexistenceofinaccuracydefects,nordoesitgivedevelopersanyconcretetestinputsfordebugging.Inthispaper,weproposetherstmetaheuristicsearch-basedapproachtoautomaticallygeneratingtestinputsthataimtotriggersignicantinaccuraciesinoating-pointprograms.Ourapproachisbasedonthefollowingtwoinsights:(1)withFPDebug,arecentlyproposeddynamicanalysisapproach,wecanbuildareliabletnessfunctiontoguidethesearch;(2)twomainfactors—thescalesofexponentsandthebitformationsofsignicands—mayhavesignicantimpactontheaccuracyoftheoutput,butinlargelydifferentways.Wehaveimplementedandevaluatedourapproachover154real-worldoating-pointfunctions.Theresultsshowthatourapproachcandetectsignicantinaccuraciesinthesubjects.I.INTRODUCTIONInaccuracycausedbyoating-pointnumbersisawell-knownprobleminsoftwaredevelopment.Incriticalsoftwaresystems,disastrousresultsmaybecausedbyoating-pointinaccuracy.AnexamplewellcitedintheliteratureisthefailureofaPatriotmissiletointerceptanincomingmissileintherstPersianGulfWar,duetotheaccumulatedoating-pointerrorsduringthecontinuoustrackingandguidance.Thisfailurecausedthelossof28livesandaround100injuries.Floating-pointnumbersareusuallylessinaccuratebecausetheyusenitenumberofdigitstorepresentarealnumber.Forexample,whenwerepresent0.1asa(singleprecision)oating-pointnumber,because0.1cannotberepresentedinnitedigitsinabinaryfraction,thevalueisinfact0.100000001490116119384765625.OneconsequenceoftheWesincerelyacknowledgeDr.HaoZhongatShanghaiJiaotongUniversityforhisusefulandhelpfulfeedbackonanearlydraftofthispaper,andXinruiHeatPekingUniversityforherhelponsettinguptheexperiments.TheauthorsfromPekingUniversityaresupportedbytheNationalBasicResearchProgramofChinaunderGrantNo.2014CB347701,andtheNationalNaturalScienceFoundationofChinaunderGrantNo.61202071,61225007,61421091,61332010.ZhendongSuispartiallysupportedbyUnitedStatesNSFGrants1117603,1319187,and1349528.YingfeiXiongisthecorrespondingauthor.inaccuracyinrepresentationistheroundingerror,e.g.,whenaddingasmallnumbertoalargenumber,signicantdigitsinthesmallnumbermayberoundedoffduetothelimiteddigitstorepresentthenalresult.Accumulatedroundingerrorsduringthecomputationmayresultinsignicantinaccuracyintheoutput.Becauseoftheimportanceofensuringaccuracyinoating-pointcalculation,severalapproaches[1]–[3]havebeenpro-posedtodetectoating-pointinaccuracies.Mostapproachesusestaticanalysisbasedonintervalarithmetic[4]orafnearithmetic[5],tryingtodeterminethepossiblerangeoferrorsintheresultofthecomputation.However,weobserveseverallimitationsinreportingtherangesoferrors:Duetothelimitationsofstaticanalysis,thecomputedrangeisoftenanover-approximationoftheactualerror,andthedifferenceisoftenverylarge.Eveninthelatestapproach[1],thecomputedupperboundisusuallyseveralordersofmagnitudelargerthantheactualerror,andissometimesinnite.Asaresult,evenifalargeupperboundisreported,itisstillunknownwhetherornotaprobleminaccuracyexists.Todebuganinaccuracyproblem,itwouldbemoreconvenienttohaveaninputthattriggerstheproblem,sothatdeveloperscanfollowtheexecutiontodiscovertherootcauseoftheproblem.However,asreportedbyBaoandZhang[6],theremaybeonlyasmallportionofinputsamongallpossibleinputscausingsignicantinaccuraciesintheoutput.Thus,itwouldbequitedifcultfordeveloperstoobtainsuchaninputmanually.Toovercometheseproblems,thispaperproposesameta-heuristicsearch-basedapproachthataimstogenerateatestinputforaprogramtomaximizetheerroroftheoutput.Tothebestofourknowledge,ourapproachistherstmetaheuristictestgenerationapproachaimingtodetectoating-pointinaccuracies.Existingtestgenerationapproaches[7],[8]foroating-pointprogramsfocusonmaximizingpathcoverageratherthanoutputerrors.Ourapproachisbasedonthefollowingtwoinsights.First,withFPDebug,anapproachrecentlyproposedbyBenzetal.[9],wecanobtainthelikelyerroroccurredintheoutputofaparticularconcreteexecution.Thus,wecanbuildatnessfunctionaroundthiserrortoguidethesearch.Second,therearetwomainfactorsoftheinput TABLEI:IEEE754oating-pointrepresentation Sign Exponent Signicand SinglePrecision 1 8 23 DoublePrecision 1 11 52 oating-pointnumbersthatmayaffecttheerroroftheoutput:thescalesofexponentandthebitformationofsignicand,buttheirrelationstoaccuracyexhibitdifferentcharacteristics.Thesefactorscouldbeexploitedfordesigningefcientsearchalgorithms.Insummary,wemakethefollowingmajorcontributions:Weperformanempiricalanalysistouncovertherelationsbetweendifferentfactorsandtheaccuracyofoutput.Theresultssuggestthatboththescalesoftheexponentsandthebitformationofsignicandsmaysubstantiallyaffectaccuracy.Whileonlyexponentsinasmallintervalleadtosignicantinaccuracies,alargeportionofsignicandsmayleadtosignicantinaccuracies.Wedesignanovelgeneticalgorithm,locality-sensitivegeneticalgorithm(LSGA),basedontheresultsoftheempiricalanalysis.Ourbasicideaistoevolvetheexponenttohitthesmallinterval,whilerandomlygeneratethesignicandasthelargeportionofbitformationiseasytohit.ThetnessfunctionisbuiltupontheoutputofFPDebug[9].Weevaluateourapproachby(1)asanitycheckonsixclassicexamples[10]ofstableandunstablealgorithms,and(2)aseriesofexperimentson154oating-pointfunctionsselectedfromthelatestversionoftheGNUScienticLibrary.Ourexperimentscomparethreesearchalgorithms,includingourown,astandardgeneticalgo-rithm,andarandomsearchalgorithm.Ourresultsrevealthat(1)ouralgorithmexhibitsabsolutesuperiorityovertheothertwoalgorithms,and(2)ourapproachisabletondextremelylargeinaccuraciesinwidely-usedrealworldscienticfunctions.Therestofthepaperisorganizedasfollows.SectionIIintroducessomebackgroundknowledge,whileSectionIIIfurthermotivatesourresearch.SectionIVpresentstheempiricalanalysisandouralgorithmdesign.SectionVreportsthetwoevaluations.SectionVIdiscussesthemainlimitationsandpossiblefutureresearch.SectionVIIdiscussesrelatedwork,andSectionVIIIconcludes.II.BACKGROUNDA.FormatofFloating-PointNumbersAccordingtoIEEE754standard[11],aoating-pointnumbercontainsthreeparts:sign,exponent,andsignicand.TableIdepictsthenumbersofbitsofthethreepartsineitherasingleprecisionnumberoradoubleprecisionnumber.Letusdenotethesignass,thevalueoftheexponentase,andthevalueofthesignicandasf.Ifallbitsoftheexponentare1,thisoatingnumberisoneofthespecialvalues:1;�1orNaN,whereNaNindicateserrorsincomputationsuchasdivisionbyzero.Otherwise,thevalueofaoating-pointnumberisdepictedbyFormula1:(�1)sf2e(1)Supposethatthesignicandisintheformofb0b1:::bn�1,thevalueofthesignicand,f,isdenedbyFormula2:f=(Pn�1i=0bi 2i+1;ifallbitsofeare01+Pn�1i=0bi 2i+1;otherwise(2)Supposethattheexponentisintheformofb0b1:::bn�1,thevalueoftheexponent,e,isdenedbyFormula3:e=n�1Xi=02ibi�2(n�1)+1(3)B.GeneticAlgorithmAGeneticAlgorithm(GA)[12]isametaheuristicsearchtechniquethatsimulatestheprocessofnaturalselectionforsolvingoptimizationproblems.InaGA,eachcandidatesolutioniscalledanindividual,andhasasetofpropertiesthatcanberepresentedinabinaryform.Thereisalsoatnessfunctionftoevaluatehowcloseanindividualistoanoptimalsolution.TheprocessofGAtypicallystartsfromasetofindividuals,randomlyselectedorpre-dened,toformtherstgeneration.Thepopulationsizeisproblem-dependent.Whentocreatethenextgeneration,alltheindividualsinthecurrentgenerationareputintoaselectionpool,inwhicheachindividualhasaprobability,whichisdependentonthetnessoftheindividual,tobeaparentforgeneratingindividualsinthenextgeneration.Tocreateanindividualinthenextgeneration,apairofparentsolutionsareselectedfromtheselectionpool.Then,twotypesofoperations(i.e.,crossoverandmutation)areusedtocreateachildofthetwoparents.Thegeneratedchildren,optionallyplustheindividualsfromthepreviousgeneration,formthenextgeneration.Theselection-creationloopisrepeateduntilreachingaterminationcondition,suchasndingagoodenoughsolution,reachingthemaximumnumberofgenerations,andreachingthetimelimit.Thereareseveralkeycomponentsindesigningageneticalgorithm,suchastheselectionmethod(i.e.,howtoselectindividualsforreproduction),crossoveroperator(i.e.,howtoproduceachildfromtwoparents),mutationoperator(i.e.,howtomutateachild),andinitialpopulation(i.e.,howtherstgenerationispopulated).III.MOTIVATINGEXAMPLEExample.Tomotivateourresearch,letusconsiderfunctionF(x)denedinthefollowingcodesnippet.F(x)isacarefullyconstructedexampletodemonstratetheproblemofoating-pointinaccuracy,containingseveralmajoroperationsonoating-pointnumbers(addition,subtraction,division)andacommoncodepattern(addingupmanynumbersinaloop).F(x)shouldalwaysreturnsaconstantnumberinrealarithmetic.floatF(floatx) f1:inti,n=8192;2:floaty,z;3:y=z=n;4:if(x)x=-x;5:for(i=0;i)6:y=y+x;7:y=y/z;8:return(0.125f+x)/(y-0.875f);gAsthelooponLines5and6willbeexecutedntimes,thevalueofybeforeexecutingLine7shouldbeequalton(1+jxj).Tomaketheanalysiseasy,wesetnto8192=213.Asaresult,thevalueofyafterexecutingLine7shouldbeequalto1+jxj.Therefore,F(x)shouldalwaysreturn1.However,duetotheaccumulationofroundingerrorsintheloop,thevalueofyafterexecutingLine7cannotbeexactly1+jxj.ThiserrorwillbefurthermagniedonLine8.Asaresult,F(x)mayproducesignicantinaccuracyforsomeinputx.Themostinaccurateoutputis1.0039062.Rangeofproblematicinputs.TherangeofxforF(x)toproducesignicantinaccuraciesisnotlarge.Thereasonisthat,whenthevalueofjxjbecomeslarger(e.g.,10timeslargerthan0.125),thevalueofF(x)willbemainlydecidedbyjxj,nottheaccumulatedroundingerror.Infact,allthecasesofinaccuracyover0.001occurwhenxisbetween-0.4and0.4.Notethat,whenthevalueofxisverycloseto0,F(x)cannotproduceverylargeinaccuracy.Thereasonisthataverysmalljxj(e.g.,smallerthan0.0001)wouldnotproducealargeenoughroundingerrorwhenexecutingthelooponly8192times.ConsideringthatthenumberofpossiblevaluesofxishugewhentestingF(x),generatingteststotriggersignicantinaccuraciesofF(x)shouldbedifcult.Infact,ifwewanttotriggeraninaccuracythatisclosetothelargestinaccuracy0.0039062,e.g.,theabsoluteerrorislargerthan0.0039,therangeofjxjismuchsmaller(i.e.,between0.0004874and0.0004883).Notethatexistingcoverage-basedtestgenerationcannothelphere,sinceanyvalueofxcanachieve100%branchcoverage.Factorsaffectingaccuracy.Thisexampledemonstratesthattherearetwodistinctivefactorsthatmayaffecttheerroroftheoutput,andtheimpactofonefactormaybeverydifferentfromtheother.Therstfactoristhescaleoftheexponentintheinput.Sincethescaleofaoatingnumberismainlydeterminedbyitsexponent,onlywhentheexponentfallsintoasmallrangecantheinputtriggeralargeerror.Thesecondfactoraffectingtheaccuracyoftheoutputisthebitformationofthesignicand.Duetothe“roundtoeven”policyadoptedbyoating-pointnumbers,largeinaccuraciesappearonlywhentheaccumulatedroundingerrorsdonotcanceleachother.Thisrequiresthattheformationofbitsinthesignicandexhibitscertainpatterns.IV.APPROACHA.ProblemDenitionGivenaprogramusingoating-pointnumbers(denotedasP)withMinputparameters(denotedasI1,I2,...,IM),wedeemtheproblemofdetectingoating-pointinaccuracyinPasasearchproblem.ThesearchspaceofthissearchproblemisthespacerepresentedbyallthepossiblecombinationsofthevaluesoftheMinputparameters.Theaimofthesearchproblemistondaparticularinput(i.e.,acombinationofthevaluesoftheMinputparameters)thatmaximizetheerroroftheoutput.Tosolvethissearchproblem,weneedacriteriontodeterminewhetheroneinputwouldleadtoamoreinaccurateoutputthananotherinput.Searchalgorithmswillusethiscriteriontoguidethesearchingprocess.Unlikecoverage,whichiswidelyusedasthecriterioninsearch-basedtestgeneration,itisnotstraightforwardtosetupthecriterionneededinourapproach.Fortunately,Benzetal.[9]recentlyproposedadynamicanalysistechniquewhichdynamicallyincreasetheprecisionoftheoating-pointnumberstocalculateboththelikelyabsoluteerrorandthelikelyrelativeerroroftheoutputofaprogramforanarbitraryinput.1Followingthecommonpractice[16],weuserelativeerrortomeasuretheinaccuracyoftheoutput.LetusdenotetherealresultasRandtheactualoutputasA.TherelativeerrorisdenedasjR�Aj=jRj.Withthisbasicframework,differentmetaheuristicsearchtechniquescanbeadoptedfordetectingsignicantinaccuracies.However,itisnoteasytodiscoversignicantinaccuraciesthroughsearching.First,asourmotivatingexampleandanexistingstudy[6]haveshown,oftenonlyaverysmallportionamongallpossibleoating-pointnumbersmayleadtoseriousinaccuracies.Whenthereismorethanoneinputparameterforaoating-pointfunction,theprobabilityofhittingalargeerrorbecomesverylow.Second,FPDebughasaslowdownofseveralhundredsoftimes[9],suchthatwecouldtestonlyarelativelysmallnumberofinputvaluesduringthesearchprocess.Asaresult,itiscriticaltodesignaneffectivesearchalgorithmthathitslargeinaccuraciesquickly.Todesignsuchanalgorithm,weperformasmallempiricalanalysistounderstandtherelationbetweentheaccuracyoftheoutputanddifferentfactors.B.EmpiricalAnalysisInSectionIIIwehaveseenthattherearetwomainfactorsthatmayaffecttheaccuracyoftheoutput,eachcorrespondingtooneofthethreemaincomponentsofaoating-pointnumbers.Thesetwofactorscanbeexploitedtodesignaneffectivesearchalgorithm.Tofurtherunderstandtherelationbetweenthetwofactorsandtheaccuracyoftheoutput,weperformedasmallempiricalanalysis.Analysissetup.Inouranalysis,werandomlychosefourfunctionsfromtheGNUScienticLibrary(GSL).Morede-scriptionofGSLcanbefoundinSectionV.Thefourfunctionsarebessel K0,Ci,erf,andlegendre Q1.Functionbessel K0computesthecylindricalBessselfunction,CicomputestheCosineintegral,erfcomputestheGausserrorfunction,andlegendre Q1computestheLegendrefunction.Wedeliberately1Theseerrorsarelikelyerrorsbecauseincreasingtheprecisiondoesnotguaranteemoreaccurateresultinallcases[13]–[15]. Fig.1:erfatsignicand0x34873b27b23c6 Fig.2:erfatexponent1023chosefunctionswithoneoating-pointinputparametertosimplifyouranalysis.WeinvokedtheseprogramsinFPDebugwithdifferentinputs,andmonitoredhowtherelativeerrorsofoutputchangebasedonthechangeofinput.Tobetterunderstandthetwofactorsindividually,wexedonefactorandchangedtheother.Firstwexedsignicandandchangedexponent.Foreachprogram,werandomlygeneratedthreesignicands.Foreachsignicand,wecombineditwitheverypossibleexponentallowedbythedoubleprecisionformattoformaninputtoasubjectprogram,andthenweexecutedtheprograminFPDebugtogettheerrorforeachinput.Secondwexedexponentandchangedsignicand.Werandomlygeneratedtwoexponentsforeachprogram.However,becauseoftheslowdownfromFPDebug,itisnotpossibletotestthewholespaceofsignicandindoubleprecision.Asaresult,wegenerated100,000randomsignicandsandtestedeachpairofsignicandandexponent.Inalltests,wesetthesignbitto0.Results.Fig.1showsdifferentrelativeerrorsatdifferentexponentsforfunctionerfwhenthesignicandisxedat0x34873b27b23c6.Fig.2showsdifferentrelativeerrorsatdifferentsignicandsforerfwhentheexponentisxedat1023.NotethatintheXaxesofbothgures,weinterpretboththeexponentsandsignicandsasunsignedintegers,andreporttheirintegervalues.Weshallusethetwoguresasexamplestoillustrateourresults,andtheexperimentsonallfunctionsexhibitverysimilarpatterns.First,weobservethatboththeexponentsandthesignicandshaveagreatimpactontherelativeerrors.Inallfunctions,bychangingthesignicandswecouldachieveadifferenceof59ordersofmagnitude,andbychangingtheexponentswecouldachieveadifferenceof418ordersofmagnitude.Second,weobservethattheexponentsthatinvokelargerelativeerrorsstayinasmallintervaloftheaxis,whilethesignicandsthatinvokelargeerrorsdistributeevenlyontheaxis.ThisresultconrmsouranalysisaboutthetwofactorsinSectionIII.Howexponentsaffectaccuracyisdecidedbytheirscales,whilehowsignicandsaffectaccuracyisdecidedbytheirbitformations.Third,aswecanseefromthegures,theportionofexponentsthatinvokelargerelativeerrorsisusuallyverysmall.Intheworstcase,thereisnomorethan0.1%oftheexponentsinvokingerrorsthatarewithintwoordersofmagnitudedifferenceofthelargesterror.Ontheotherhand,theportionofsignicandsthatinvokelargerelativeerrorsisusuallylarge.Evenintheworstcase,westillhavemorethan28%ofthesignicandsinvokingerrorsthatarewithintwoordersofmagnitudedifferenceofthelargesterror.Notethatthisresultisconsistentwiththeexistingstudy[6]:Althoughalargeportionofsignicandsmayinvokelargeerrors,theprobabilityofarandominputinvokingalargeerrorisstillverysmall,asboththesignicandandtheexponentneedtoinvokelargeerrors.Fourth,thereusuallyexistexponentsthatareneartheintervaloflargeerrorsandleadtoerrorshigherthantheaverage.AsshowninFig.1,thereissmallburstoferrorneartheexponentof1000,beforethelargeburstnear1023.Thoughtheerrorsfromthesmallburstarestillmuchsmallerthanthelargesterror,theyarehigherthanmostothererrors.Fifth,weobservethatthesmallintervalofexponentsleadingtolargeerrorsislikelytobenearthevalue1023,whichisjustthemedianofalldouble-precisionexponents.Thisindicatesthattheexponentsaroundthemedianmayhavehigherprobabilitytoinvokelargerelativeerrors.Thethirdobservationsuggeststhatthekeytodesignaneffectivesearchalgorithmistohitthesmallintervaloftheexponentsthatleadtolargeerrors,whichismuchmoredifcultthanhittingasignicandthatleadtoalargeerror.Thefourthandfthobservationsindicatepossiblestrategiestowardsthisproblem.Thenextsectionexplainshowwedesignouralgorithmbasedontheseobservations.C.OurGeneticAlgorithmAsrevealedbythefourthobservation,exponentsnearthesmallintervaloflargerelativeerrorsmayalsoinvokearelativelyhigherror.Thoughitisdifculttohitthesmallintervaloflargeerrors,itismucheasiertohitanexponentnearit,andgraduallyevolvetheexponenttohitthesmallinterval.Basedonthefthobservation,exponentsinvokinglargeerrorsoftenappearnearthemedianofallpossibleexponents,sosearchingaroundthemedianmightbemoreeffectivethansearchinginotherplaces.Basedontheseideas,wedesignageneticalgorithm,namedlocality-sensitivegeneticalgorithm(LSGA).Ahigh-leveloutlineofouralgorithmisshowninAlgorithm1.Letusrstconsiderprogramswithoneoating-pointinputparameter.Ourgeneticalgorithmrstrandomlygeneratesasetofexponents(line1).Thisgenerationprocesstriestomakethegeneratedexponentsevenlydistributedinthespaceofallexponents, Algorithm1OutlineofLSGA 1:population generateInitialPopulation()2:fori Ndo3:input select(population)4:input mutate(input)5:population.add(input)6:returnmaxError(population) butfavorsexponentsaroundthemedian.Theseexponentsarecombinedwithrandomlygeneratedsignicandsandsignstoformtheinitialpopulation.Duringeveryiteration,wepickanindividualwithahighrelativeerror(line3),mutateit(line4)andputitbacktothepopulation(line5)withoutremovingtheoriginalonefromthepopulation.Themutationprocessaddsarandomnumbertotheexponent,regeneratesitssignicandrandomly,andipsitssignbitwithaprobability.Thisprocessrepeatsuntilapredenednumberofiterationshavebeenreached,andtheresultwiththehighestrelativeerrorisreturned(line6).Programswithmultipleoating-pointinputparametersareprocessedinasimilarway,buteachtimewedealwithasetofoating-pointnumbersratherthanonenumber.Thealgorithmalwaysrandomlygeneratesthesignicandbecause,asrevealedbyourempiricalanalysis,arandomlygeneratedsignicandalreadyhasahighprobabilitytoinvokealargeerror.Toavoidfurthercomplicationforouralgorithm,weuserandomsearchforthesignicands.Weregeneratethesignicandsateverymutationtoincreasethediversityofthepopulation.Ouralgorithmalsodropsthecross-overoperationinthestandardgeneticalgorithm,becausewedonotndsensibleoperationtocombinetwoexponentsbasedontheirscales.Asamatteroffact,thiscoincideswiththedesignofmanyexistinggeneticalgorithms,whichfavormutationovercross-over[17].Wecanseethattherearethreemaincomponentsofthealgorithm:generateaninitialpopulation,select,andmutate.Inthefollowingweexplaineachindetail.InitialPopulation.Wegeneratenexponentsastheinitialpopulationbasedonauniformdistribution,buttheexponentsintheinterval[median�t;median+t]hasaprobabilityvetimesashighasotherexponents.Numbermedianisthemedianvalueofallpossibleexponents,being1023fordouble-precisionand127forsingle-precision.Numbertisequalto2dk=2einourcurrentimplementation,wherekisthenumberofdigitsintheexponentpart.Toimplementthisdistribution,weseparatethespaceofexponentsintonintervals,wherethelengthsoftheintervalswithin[median�t;median+t]arevetimesassmallasthoseoutside[median�t;median+t].Forprogramswithasingleoating-pointinputparameter,werandomlygenerateanexponentwithineachinterval,andgetninputs.Forprogramswithmultipleoating-pointinputparameters,werandomlypickanintervalandgenerateanexponentwithintheintervalforeachparameter.Werepeatthegenerationntimesfornsetsofparameters.Foreachgeneratedexponent,werandomlygeneratethesignbitandthesignicandtoformaoating-pointnumber.ThenweruntheprograminFPDebugwitheachsetofinputparameterstogettherelativeerrorofitsoutput.SelectionMethod.Intheselectionstep,wewouldliketofavortheinputsthatleadtoalargererror.Astandardselectionmethodforthispurposeisroulettewheelselection[18],wheretheprobabilityofselectingindividualiisfi=Pnj=1fj,wherefiisthetness(inourcase,relativeerror)ofindividualiandnisthetotalnumberofindividuals.However,thismethodisnotsuitableforourcasebecausetheprobabilityofhittingalargeerrorissmall.Ifwehavenothitalargeerror,itislikelythatinourpopulationthereisonlyafewnumberofindividualswhoseerrorsareslightlylargerthantheothers.Roulettewheelselectioncannotselectthoseslightlybetterindividualsbecausetheirnumberistoosmall.Ontheotherhand,ifwereallyencounteranindividualwhohasseveralordersofmagnitudelargererrorthanotherindividuals,itisunlikelywewillchooseanyotherindividual.Toovercomethisproblem,wechoosethegroup-basedrankselection,whichismoresuitabletoourcasebasedonexistingstudies[18].Werstgroupthepopulationbytheirrelativeerrors.Everytwoindividualsinagrouphaveadifferenceofnomorethantwoordersofmagnitudeinrelativeerror.Thenweselectagroupbytheirrank.Werstselectthegroupwithhighestrelativeerrorswithaprobabilityofp.Iftherstgroupisnotselected,weselectthegroupwiththesecondhighestrelativeerrorwithprobabilityofp,andetc.Afterwehaveselectedagroup,wepickarandomindividualfromthegroup.Basedonourexperiencewithafewsmallprograms,wesetp=0:6.MutationOperation.Asmentionedbefore,givenaoating-pointnumber,ourmutationoperationaddsavaluev(canbepositiveornegative)toitsexponent,ipitssignbitwithaprobabilityq,andrandomlyregenerateitssignicand.Aftermutation,weruntheprogramwiththemutatedinputinFPDebugtogettherelativeerrorontheoutput.ThevaluevisdecidedbyanormaldistributionN(0;2),where2isthelengthoftheintervalintheinitialpopulationwhichthisexponentfallsin.Thissettingensuresthatthechangetotheexponentisalwayssmall,andisevensmalleraroundthemedianbecausetheinitialpopulationisalreadycondensedaroundthemedian.Toensurethesignbitisnotfrequentlychanged,wesetqas0:1inourcurrentimplementation.NumberofIterations.SincethemaximumpopulationsizeinouralgorithmisconnedbythenumberoftimesthatFPDebugcanbeexecutedwithinatimeframe,itisimportanttoproperlybalancebetweenthenumberofinitialpopulation(denotedasn)andthenumberofmutations(denotedasm).Wecurrentlysetn=minourimplementation.Thissettingisdifferentfromcommongeneticalgorithmswherenm.However,sinceourmaintaskistohitthesmallintervalofexponentsthatleadtolargeerrors,itisimportanttohavealargeinitialpopulation TABLEII:SanityCheckResults Newton Inv Root Poly Exp Cos LSGA 2.8E-16 3.2E-16 8.1E+76 1.7E-11 1.0E+00 9.2E-01 BGRT 1.7E-16 2.5E-16 1.3E-14 4.7E-14 2.1E-15 1.2E-16 tocoverthewholespace.V.EVALUATIONA.SanityCheckWerstevaluatedourtechniqueonasetofcollectedtestsubjectsfromrelatedwork[10].Thesesubjectsconsistofsixclassicexamplesofstableandunstableimplementations,asshowninTableII,wherethersttwoprogramsarestableandtherestareunstable.Stableimplementationsarelikelytoproduceaccurateresultsthanunstableones.Ourtechniqueisabletoconrmthosestableimplementationsandndrelevantinputsthatleadtolargeerrorsfortheunstableones.Foreachofthetwostableprogramsintheset,themaximalrelativeerrordetectedbyourapproachissmallerthan110�15.Foreachofthefourunstableprogramsintheset,themaximalrelativeerrordetectedbyourapproachislargerthan110�11.Thisresultprovidesapreliminaryevidenceonthefeasibilityandusefulnessofourapproach.Interestingly,inparalleltoourwork,Chiangetal.[19]alsoexploresearchalgorithmsforinputsthatcauselargeerrors,andintheirexperimentsaguidedrandomsearchalgorithm(BGRT)worksbest.WealsocomparedourtechniquewithBGRTonthesixsubjects.AswecanseefromTableII,onnoneofthesubjectsBGRToutperformsLSGA,andBGRTalsocannotdistinguishstableandunstablesubjects.Onepossibleexplanationisthatguidedrandomresearchcannotexploitthebenetofthefourthobservation.Consequently,thoughBGRTfoundrelativelylargeerrors,itcouldnothitthelargestone.B.ExperimentOverviewandResearchQuestionsTofurtherunderstandtheperformanceofourapproachonrealworldprograms,weperformedanexperimentalstudyusing154functionsfromtheGNUScienticLibrary.Inourstudy,weexperimentallycomparedourapproacheswithtwostandardsearchtechniques,arandomsearch(RAND)andastandardgeneticalgorithm(STD),servingasthecontroltechniques.Tocomparethethreetechniques,alimitshouldbesettoterminatethesearch.Basedonourtesting,executingprogramsinFPDebugoccupiesmorethan99.5%oftheexecutiontimeinallthreesearchalgorithms.Forconvenience,wesetalimitonthenumberoftimesthatFPDebugcanbeinvokedbyeachalgorithm.Thenwecomparedtheeffectivenessofthethreetechniquesoneachfunctionunderthislimit.Ingeneral,ourexperimentalstudyaimstoinvestigatethefollowingtworesearchquestions.Therstresearchquestion(RQ1)isconcernedwithwhichofthethreetechniquetendstondlargerinaccuraciesforeachexperimentedfunction.Thesecondresearchquestion(RQ2)concernswhetherourapproachisabletodetectpotentialaccuracyproblemsinpractice.TABLEIII:SubjectswithDifferentParameters Total 1-Para 2-Para 3-Para 4-Para 154 104 37 8 5 C.ExperimentalSetupWeconductedourexperimentalstudyonavirtualmachinerunningtheUbuntu-10.04.4,hostedonaPCwitha2.3GHzIntelPentiumi5-2410MCPUand6GBmemory.1)Subjects:Oursubjectsarealsochosenfromthelatestversion(i.e.,version1.16)oftheGNUScienticLibrary(GSL)2assubjects.GSLisanopen-sourcenumericallibraryforCandC++programmers.Thelibraryprovidesawiderangeofmathematicalroutinessuchasrandomnumbergenerators,specialfunctions,andleast-squarestting.GSLhasbeenusedinpreviousstudies[10],[20]onanalyzingprogramswithoating-pointnumbers.GSLhasintotal210functionsinvolvingintensiveoating-pointcomputations.Fromthesefunctions,whosetotalsizeis48Klinesofcode,wechoseallthefunctionswithallinputsbeingoating-pointnumbersandtheoutputbeingalsoaoating-pointnumber.TherearealsothreefunctionswhereFPDebugisreportedtonotworkproperly[9],soweremovedthethreefunctions.Asaresult,weused154functionsinourexperimentalstudy.Eachofthe154functionshasupto4parameters.TableIIIdepictsthenumbersoffunctionshaving1,2,3,and4parameters,respectively.Alltheparametersandthereturnvaluesofthe154functionsareofdoubleprecision.2)ControlTechniques:Asmentionedearlier,weusearandomsearchandastandardgeneticalgorithmascontroltechniques.Therandomsearchrandomlygenerateseverybitoftheinputparametersateachiteration,andreturnsthemaximumrelativeerrorinalliterations.Thestandardgeneticalgorithmisdesignedbyconguringtheclassicframeworkforgeneticalgorithm[21]usingstandardoperations[21],[22].Moreconcretely,thealgorithmstartsfromNrandomlygeneratedinputsasinitialgeneration.Foreachgeneration,thealgorithmsrepeatn=2reproductioniterationstoproducenchildrenforthenextgeneration,witheachiterationproducingtwochildren.Ineveryreproductioniteration,thealgorithmpickstwoindividualsfromthecurrentgenerationbasedonroulettewheelselection.Theweightofeachindividualisthelogarithmoftherelativeerror.Asrecommendedbytheframework[21],computinglogarithmsissuitableforlargelydifferenttnesses.TheneachpairofparametersatthesamepositioninthetwoindividualsarecrossedoverwithaprobabilityC.Thecrossoverisperformedbytreatingthenumbersasbitvectors,andexchangetherstibitsofthetwonumberswitharandomlygeneratedi.Finally,everybitofthetwoindividualsareippedwithprobabilityM.Afteranewgenerationisproduced,theprocessstartswiththenewgeneration.Finally,theindividualwiththelargesterrorinallgenerationsisreturned.Followingtherecommendationinexistingpapers[21],[22],wesetN=20;C=0:96;M=0:1.2http://www.gnu.org/software/gsl/ 3)ExperimentalProcedure:Toanswertherstresearchquestion,foreachofthesubjectfunction,weusedeachofthethreesearchtechniquestocalculatethemaximumrelativeerrorthatthetechniquecannd.ThelimitoftimesforinvokingFPDebugis200,whichareapproximatelyequalto60seconds.Inotherwords,randomsearchwillgenerate200inputs,thestandardgeneticalgorithmwillhave10generationswith20individualseach,andouralgorithmwillhave100initialindividualsand100mutations.Therearemainlytworeasonsforustousearelativelyshortiterationlimit.Ofcourse,moreexperimentsareneededtofurtherstudythepracticaliterationlimitforeachofthetechniques.First,allthesubjectfunctionsarelibraryfunctions,whicharebuildingblocksforrealworldsoftware.Asaresult,theexecutiontimeofonetestinputforarealworldprogramcanbehundredsoreventhousandsoftimesofthatforasubjectfunctioninourexperimentalstudy.Thus,asmalliterationlimitforoursubjectsmaybeequivalenttoalongtimeexecutionforrealworldprograms.Inotherwords,onlytechniquesthatcanachievesatisfactoryresultsonoursubjectsinashortiterationlimitwouldhavemorepracticalvalueforrealworldprograms.Second,asthenumberofsubjectfunctionsinourexperi-mentalstudyislarge,arelativelysmalliterationlimitwouldhelpuscontrolourexperimentalprocedure.Notethateachtechniquemayneedtobeexecutedmanytimesagainsteachsubjectfunctionduetocalibration.Thesecondresearchquestionisdifculttoanswerbecauseitisdifculttodecidewhetheralargeerrorisaproblemornot.Largeerrorsmaybefundamentallyinevitableinmanycomputationsandarenotconsideredproblems.Interestingly,manyGSLfunctionsreportanestimatedabsoluteerrorforeachexecution.Usingthisinformation,weconservativelydeemalargerelativeerror0:1%asapotentialproblemwhentheactualabsoluteerroris10timeslargerthantheestimatedone,asthelargeerrorisprobablyunexpectedbythedeveloperandmaycauseseriousconsequences.Forthegeneratedtestinputsthattriggeranerrorlargerthan0:1%,weinvoketheassociatedfunctionswiththeseinputsinFPDebug,andcomparetheestimatedabsoluteerrorwiththeactualerrorreportedbyFPDebug.Whentheactualerroris10timeslargerthantheestimatedone,weconsideritapotentialproblem.D.ThreatstoValidityThemainthreattotheinternalvalidityliesinthepossiblefaultsinourimplementation.Toreducethisthreat,wereviewedalloursourcecodebeforeconductingtheexperiments.Notethat,asBenzetal.[9]havemadetheirtoolpubliclydownload-able,weimplementedthethreetechniquesbydirectlyinvokingtheirtool,helpingusfurtherreducethisthreat.Themainthreattoexternalvalidityisconcernedwiththerepresentativenessofoursubjects.Toreducethisthreat,weusedalargenumberofwidelyusedfunctions(whichhavealsobeenusedinpreviousstudies[10],[20])assubjectsinourstudy.Conductingmoreexperimentsusingmorerealworldprogramsassubjectswouldhelpusfurtherreducethesethreats.TABLEIV:MaximumInaccuraciesDetected Total RAND STD LSGA Tied 154 11(7%) 24(16%) 105(68%) 14(9%) TABLEV:SignTestonInaccuracyDetection n+ n� N p LSGAvs.RAND 127 12 139 4.14e-22 LSGAvs.STD 110 30 140 2.46e-11 STDvs.RAND 93 40 133 6.52e-06 ThemainthreattoconstructvalidityisthelimitoftimesthatFPDebugcanbeinvoked.Toreducethisthreat,weusedashortlimittomaketheexperimentedtechniquesapplicableforrealworldprogramswhoseexecutiontimemaybemuchlargerthanthatofoursubjects.Notethat,atechniquethatcandetectinaccuracyinashorttimewouldbecomemoreeffective(oratleastaseffective)whenusedunderalongtimelimit.E.ExperimentalResultsInthissubsection,wepresenttheexperimentalresultsforthetworesearchquestions.Allourexperimentaldataareonline.31)RQ1:EffectivenessofInaccuracyDetection:Givenasubject(denotedass)andatechnique(denotedast),ifthemaximumrelativeerrorreturnedbythetechniqueislargerthanboththetwomaximumrelativeerrorsreturnedbytheothertwotechniques,wedeemthattisthebesttechniquefors.Therefore,foreachofthreetechniques,wecountthenumberofsubjectsforwhichthetechniqueisthebest.Forsomesubjects,nosingletechniqueissuperiortoboththeothertwo,wedeemthatnotechniqueisthebestanddenotedthissituationasatie.TheresultofthiscomparisonisdepictedinTableIV.Aswecanseefromthetable,LSGAndsthemaximumerrorsinthemajorityofthesubjects,whilerandomsearchndsthemaximumerrorsintheleastnumberofsubjects.ThisresultsindicatesthatLSGAaremoreeffectivethantheothertwoalgorithms,whilestandardgeneticalgorithmisbetterthanrandomsearch.Tofurtherconrmwhetherthedifferencesbetweenthethreetechniquesarestatisticallysignicant,weperformthesigntestoneachpairoftechniques.TheresultofoursigntestoninaccuracydetectionisdepictedinTableV.Fromthetablewecanseethatthedifferencesbetweeneachpairoftechniquesaresignicant,aspismuchsmallerthan0.05,theusualthresholdforsignicantdifference.Inparticular,LSGAhasaverysmallpwhencomparedwiththeothertwotechniques,whichindicatethatLSGAhasabsolutesuperiorityovertheothertwotechniques.Thesigntestonlyconsiderswhichtechniqueperformsbetterforeachsubject,butdoesnotconsiderwhetherthedifferencesofrelativeerrorsfoundbydifferenttechniquesaresignicant.Iftherelativeerrorsfoundbytwotechniquesareveryclose,bothtechniquesareusableinpractice.Wedeemthatthedifferencebetweenthetworelativeerrors,e1ande2issignicant,if3http://sei.pku.edu.cn/%7exiongyf04/papers/ICSE15.html TABLEVI:SignicantlyLargerInaccuraciesDetected Left Right LSGAvs.RAND 55(36%) 3(2%) LSGAvs.STD 44(29%) 5(3%) STDvs.RAND 25(16%) 9(6%) e1=e2ore2=e1islargerthan10.Thenwecalculate,foreachpairoftechniques,howmanysignicantlylargererrorsonetechniquesfoundovertheother.TheresultisshowninTableVI.The“Left”columnshowshowmanysignicantlylargerinaccuraciesthelefttechniquefoundthantheright.Similarly,the“Right”columnshowshowmanysignicantlylargerinaccuraciestherighttechniquefoundthantheleft.Aswecanseefromthetable,LSGAfoundsignicantlylargererrorsthantheothertwotechniquesinalargenumberofsubjects,whiletheothertwotechniquesfoundsignicantlylargererrorsthanLSGAonlyinrarecases.Althoughrandomsearchperformedtheworstamongthethreetechniques,theaboveresultalsosuggeststhatrandomsearchstillfoundrelativelylargeerrorsonsomesubjects.Tofurtherunderstandwhythishappened,weanalyzeasamplefunctionwhererandomsearchreturnsalargeerror.Thisfunctionisgsl_sf_bessel_j1,whichsolvessphericalBesselfunction:j1(x)=sin(x) x2�cos(x) xForthisfunction,randomsearchreachesthemaximumrelativeerrorof1.089913e+02,afairlylargeerror.Theimplementationcodeofthisfunctionislistedbelow:intgsl_sf_bessel_j1_e(constdoublex,gsl_sf_result*result)f1:doubleax=fabs(x);2:3:/*CHECK_POINTER(result)*/4:5:if(x==0.0)f6:�result-val=0.0;7:�result-err=0.0;8:returnGSL_SUCCESS;9:g10:elseif(ax3.1*GSL_DBL_MIN)f11:UNDERFLOW_ERROR(result);12:g13:elseif(ax0.25)f14:constdoubley=x*x;15:constdoublec1=-1.0/10.0;16:constdoublec2=1.0/280.0;17:constdoublec3=-1.0/15120.0;18:constdoublec4=1.0/1330560.0;19:constdoublec5=-1.0/172972800.0;20:constdoublesum=1.0+y*(c1+y*(c2+y*(c3+y*(c4+y*c5))));21:&#x-600;result-val=x/3.0*sum;22:&#x-600;result-err=2.0*GSL_DBL_EPSILON*&#x-600;fabs(result-val);23:returnGSL_SUCCESS;24:g25:elsef26:gsl_sf_resultcos_result;27:gsl_sf_resultsin_result;28:constintstat_cos=gsl_sf_cos_e(x,&cos_result);29:constintstat_sin=gsl_sf_sin_e(x,&sin_result);30:constdoublecos_x=cos_result.val;31:constdoublesin_x=sin_result.val;32:&#x-600;result-val=(sin_x/x-cos_x)/x;33:&#x-600;result-err=(fabs(sin_result.err/x)+fabs(cos_result.err))/fabs(x);34:&#x-600;result-err+=2.0*GSL_DBL_EPSILON*(fabs(sin_x/(x*x))+fabs(cos_x/x));35:&#x-600;result-err+=2.0*GSL_DBL_EPSILON*&#x-600;fabs(result-val);36:returnGSL_ERROR_SELECT_2(stat_cos,stat_sin);37:ggIntheabovecode,GSL_DBL_MINisabout2.22507e-308.Thisfunctiondividestheinputspaceintofoursegments,andreturns(1)aconstantnumber(line6),(2)anunderowerror(line11),(3)thevaluecalculatedbyseriesexpansion(lines14-21),and(4)thevaluecalculatedbystandardlibraryfunctions(lines28-32).Sincetheprogramhasnoloop,theonlypossibilityofproducingsuchalargeerroristosubtracttwosimilarnumbers,knownascancellation[16].Cancellationcanhappenatadditionandsubtraction,whichexistonline20andline32.Inthecaseofline20,becausethepathcondition(line13)isjxj0:25,yandeachciproducedbetweenlines14and19willbelargelydifferentonscale,andwewillnotsubtracttwosimilarvalues.Ontheotherhand,largeerrormaybetriggeredonline32,e.g.,whenxisalargenumberandcos(x)happenstobenearzero.Asaresult,theprobabilityofproducingalargeerrorforarandominputisdecidedbytheprobabilityofexecutingline32andtheprobabilityofproducingalargeerrorwhenline32isexecuted.Weobtaintheformerbyanalyzingthepathconditionandthelatterbysampling.Thepathconditionofline32isjxj0:25,and50:14%ofalldouble-precisionoating-pointnumbersfallinthisrange.Wethenrandomlycreated100testinputssatisfyingjxj0:25,and15ofthemgeneratedarelativeerrorlargerthan1:0.Puttingthetwoprobabilitiestogether,7.521%ofrandominputswilltriggeranextremelylargeerror.Itisveryeasyforrandomsearchtolocateaninputwithinthisrange.Sincetherewere200testinputscreatedinourexperiment,thereisaprobabilityof99.99995%totriggeralargeerror.Theanalysisofthissamplefunctionexplainswhyrandomsearchcanndlargeerrorsinsomecases:thereexistsubjectsforwhichverysignicantinaccuraciescanbeeasilytriggeredbychance.Nevertheless,ourresultsalsosuggestmostprogramsdonotbelongtothiscategory,andmetaheuristicsearch-basedapproacheswouldbeusefulinlocatinglargeerrorsintheseprograms.2)RQ2:AbilitytoIdentifyPotentialProblems:Inthepreviousexperiment,ouralgorithmgeneratedtestinputsfor59functionswheretherelativeerrorislargerthan0:1%.Weinvokedthesefunctionswiththegeneratedtestinputs,andfunctionsreturnedestimatedabsoluteerrorstogetherwiththeresult.Bycomparingtheactualandestimatedabsoluteerrors,wefound18functionsthathavepotentialproblemsofinaccuracy.Thisresultagainoutperformstheothertwoalgorithmssignicantly,wheretheSTDfoundsevenpotentialproblemsandRANDfoundve.Thedetailedresultaboutthe18functionsisshowninTableVII.Eachlineisapotentialproblemourapproachidenties.Therstcolumnliststhefunctionnames,thesecond TABLEVII:FunctionswithPotentialBugs Name RelativeError EstimatedAbsoluteError ReportedAb-soluteError airy Ai deriv 1.54E+06 1.04E-06 1.35E+00 airy Ai deriv scaled 1.54E+06 1.04E-06 1.35E+00 clausen 5.52E-02 6.37E-17 2.31E-02 eta 9.58E+13 1.27E+37 2.71E+50 exprel 2 2.85E+00 4.44E-16 7.41E-01 gamma 1.07E-02 6.94E-14 1.05E-01 synchrotron 1 5.35E-03 4.47E-14 3.07E-04 synchrotron 2 3.67E-03 6.39E-14 1.86E-04 zeta 9.58E+13 3.41E+18 1.19E+32 zetam1 1.42E-02 1.51E+19 7.42E+30 bessel Knu 6.08E-03 3.33E+22 9.05E+34 bessel Knu scaled 6.08E-03 2.66E+22 9.05E+34 beta 9.21E-03 4.91E-13 2.04E-01 ellint E 8.92E-03 1.58E-16 3.14E-03 ellint F 8.79E-03 1.86E-16 3.64E-03 gamma inc Q 1.36E+13 8.88E-16 1.25E-12 hyperg 0F1 5.80E+06 2.08E+37 7.33E+49 hyperg 2F0 4.35E-03 5.20E+02 3.19E+12 columnliststherelativeerrorsreportedbyFPDebug,thethirdcolumnliststheestimatedabsoluteerrorsreportedbythesubjectfunctions,andthelastcolumnliststheabsoluteerrorsreportedbyFPDebug.Wecanseethatinmostcasestheactualerrorsaremanyordersofmagnitudeslargerthantheestimatedones,indicatingpotentiallyseriousproblemsinpractice.BysearchingtheGSLrepository,wefoundthatnoneofproblemsinthe18functionshasbeenreportedasbugs,whilethereexistbugreportsreportingrelativeerrorsinafewordersofmagnitude.Wehavesubmittedbugreportsfortheidentiedfunctions,buthavenotreceivedanyfeedbackyet(possiblyduetothefactthatthereisonlyoneprogrammeractivelymaintainingGSLnow).VI.LIMITATIONSANDFUTUREWORKFirst,ourapproachreliesonFPDebugtodetecttheaccuracyoftheoutput,andFPDebugdetectstheaccuracythroughpromotingtheprecisionofoating-pointnumbers.Forex-ample,iftheoriginalprogramusessingle-precisionnumbers,FPDebugwillusedouble-precisionnumberstoperformthesamecomputation,andcomparetheresultstogettherelativeerror.Intheory,thisapproachmaynotalwaysproducethecorrectresult,becauseprecision-specictreatmentsmaybeusedinprograms.Forexample,theoriginalprogrammaypredictsomelargeerrorfortheprecisionitusedandaddtotheresultapre-denedvaluetocompensatetheerror,ortheoriginalprogrammayusebitoperatorstoacceleratethecomputation,whichusuallyworksonlyforacertainprecision.Raisingprecisionontheseprogramsmaynotleadtomoreaccurateresults.Asaresult,theinaccuraciesreportedbyourapproachareonlyindicationsofpotentialaccuracyproblemsandarenotguaranteedtobebugs.However,thisprobablyisnotaprobleminpractice.First,programmersareabletoidentifythefalsepositiveseasily,astheyknowwhetheranyprecision-specictreatmentisusedintheirprograms.Second,precision-specictreatmentsarenotverycommonlyusedinpractice.Asamatteroffact,precisionadjustmenthasbeenusedindifferentapproaches[6],[23]–[25],andnoproblemisreportedasfarasweknow.Second,asourapproachisbasedontesting,wecannotguaranteetheinaccuracydetectedbyourapproachtobealwaysthemaximuminaccuracytheprogramundertestcanproduce.Similarly,whenthereareseveralinputsintheinputdomainoftheprogramundertesttotriggersignicantinaccuracy,ourapproachmayndonlyoneofthem.Notethat,whenthereareseveralindependentinaccuracy-relatedfaults,itisverylikelythatthesefaultsleadtomultipleunrelatedinaccuracy-inducinginputs.Infact,aswedonotknowthemaximuminaccuracyoftheprogramundertest,wealsodonotknowhowclosetheinaccuracyproducedbyourapproachistothemaximuminaccuracy.Infuturework,weneedtoestablishsomebenchmarksofmaximuminaccuracythroughexhaustivesearch,andusethemaximuminaccuracytoevaluateourapproach.Furthermore,weshouldalsoinvestigatetheinjectionofinaccuracyfaultstocreatesubjectprogramswithcontrollableinaccuracy.Third,althoughFPDebugprovidedbyBenzetal.[9]needstoinstrumentthebinarycodeoftheprogramundertest,ourapproachitselfisinprincipleablack-boxapproach.Thatistosay,ourapproachdoesnotanalyzethecodeoftheprogramundertest,butcompletelyreliesontheresultsproducedbyFPDebug.ThisstrategyshouldbesuitableforprogramswithsimplecontrolstructureslikethefunctiondiscussedinSectionIII.Buttherearealsoprogramswithbothcomplexcontrolstructuresandintensiveoating-pointcomputations.Infuturework,weplantoinvestigateapproachesthatcanalsoutilizeinformation(e.g.,coverageinformation)obtainedfromcodeanalysis.Onepossiblebenetofusingcoverageinformationisthat,ascollectingcoverageinformationismuchcheaperthanexecutingBenzetal.'stool,usingcoverageinformationtoavoidalwaysexecutingBenzetal.'stoolcouldmakeourapproachmoreefcient.Anotherpossiblebenetofusingcoverageinformationisthatcoverageinformationmayguideourapproachtoexploremorepathstondmorethanoneindependentinaccuracy-relatedfaults.Fourth,ourapproachiscurrentlyapplicabletoonlyoating-pointparameters.Ifthefunctionhasothertypeofparameters,theuserhastospecifypre-denedvaluesforthem.Infuturework,weplantoinvestigateapproachesthatcanalsodealwithothertypesofparameters.Onepossiblewayistoconsidersuitablerepresentationsofothertypesofparametersinourgeneticalgorithm.Thus,ourgeneticalgorithmcanbenaturallyextendedtothesetypes.Anotherpossiblewayistousedifferentheuristics(e.g.,aheuristicbasedoncoverage)forothertypestoconsiderthattheymayplaydifferentrolesfromoating-pointnumbersinprogramswithintensiveoating-pointcalculation.Finally,itwouldbeinterestingtocompareourapproachwithstaticanalysisapproachesthatgiveanupperboundoferrors.However,itishardtobedoneatthecurrentstagebecausewedonotknowthemaximumpossibleerrorsofoursubjects.Furthermore,tothebestofourknowledge,thereexistsnonontrivialbenchmarkwithpreciousboundsoferrors.Whena staticanalysisproducesalargeupperboundwhileourapproachndsonlyasmallerror,wecannotdecidewhichapproachperformsbetter.Infuturework,benchmarksofoating-pointerrorcanbedevelopedsothatdifferentapproachescanbecompared.VII.RELATEDWORKStaticanalysis.Manyapproacheshavebeenproposedtostaticallyanalyzethepossibleupperboundoferrors.Theseapproachesaretypicallybuiltuponintervalarithmetic[4]orafnearithmetic[5].Intervalarithmeticpresentseachnumberasapairofalowerboundandaupperboundofvalues,andreplacesbasicarithmeticoperationsasoperationsbetweenintervals.Forexample,addingtwointervalsresultinginanotherintervalpresentingthemaximallypossiblerangeoftheresult:[a;b]+[c;d]=[a+c;b+d].Afnearithmeticenhancesintervalarithmeticbydistinguishingerrorscomingfromdifferentsources.Inafnearithmetic,eachnumberisrepresentedasanafne:v+x11+x22+:::wherevistheprecisevalue,xiistheerrorcomingfromasourceandiisasymbolrepresentingthesource.Bydifferentiatingerrorsbytheirsources,afnearithmeticcanproducebetterresultinoperationssuchasn�nwheretheoperandscontainerrorsfromthesamesources.Typicalstaticanalysisapproachesreplacethenumbersandoperationsinthetargetprogramintheirinterval/afneform,andusestandardprogramanalysistechniquessuchassymbolicexecutionanddata-owanalysistoobtainthepossibleerrorsontheoutput.Forexample,Putotetal.[2]presentastaticanalysisthatreliesonabstractinterpretationbyintervalform.GoubaultandMartel[3]alsoproposeanapproachbasedonafnearithmetictoanalyzenontrivialnumericalcomputations.However,theseapproachesusuallyconsiderthepossibleerrorsforthewholeinputspace,andcannotidentifywhichinputcouldproducealargeerror.Furthermore,duetothenatureofstaticanalysis,alargeintervalontheoutputdoesnotguaranteetheexistenceofalargeerror.Staticverication.Anotherbranchofapproachestrytoverifytheprecisionofoating-pointprograms.Givenapropertyabouttheoating-pointaccuracy,theseapproachestrytoverify,automaticallyorinteractively,whetherthepropertyholdsinaprogram.Boldoetal.[26],[27]buildsupportforoating-pointCprogramsinCoq,allowingonetoeasilybuildproofsforoating-pointCprogram.AyadandMarch´e[28]proposetheuseofmultipleproverstotrytoautomaticallygenerateproofsofoating-pointproperties.DarulovaandKuncak[1]proposeatypesystemtoguaranteetheprecisionofoating-pointprograms.However,theverication-basedapproachessufferfromthesameproblemasstaticanalysis:failingtoproveapropertiesdoesnotnecessarilyimpliestheexistenceofalargeerror,andnoinputcouldbeprovidedforfurtherdebugging.Precisiontuning.FPDebug[9],thekeycomponentenablingourapproach,dynamicallyanalyzestheprogrambyperformingalloating-pointcomputationsidebysideinhigherprecision.Thedifferencebetweenthestandardresultandtheresultinhigh-precisionistheerroroftheresult.BaoandZhang[6]proposestoreducethecostofdetectionbynotcomputingthepreciseerrorbutmarkingandtrackingpotentiallyinaccuratevalues.Basedonsimilarideasoftuningprecision,Lametal.[24],Rubio-Gonz´alez[25],andSchkufzaetal.[23]proposeapproachesthatautomaticallyreducetheprecisionofoating-pointcomputationtoenhanceperformancewithanacceptablelossofprecision.Theseapproachesserveasevidencesthatchangingprecisionisafeasibletechniqueforvariouspurposes.Externalerrors.Ourapproachfocusesoninternalerrors,whichisabouthowmuchinaccuracymaybeintroducedduringthecomputationoftheprogram.Externalerrorsareerrorsfromthesourcesoutsidethescopeoftheprogram.Suchexternalerrorsintheinputmaybemagniedduringthecomputationoftheprogramandresultinsignicantinaccurateoutput.Differentapproacheshavebeenproposedtoanalyzehowrobustaprogramisunderaninaccurateinput.RecentworkincludestaticvericationofrobustnessbyChaudhurietal.[29],dynamicanalysisbyTangetal.[10],anddynamicsamplingbyBaoetal.[30].However,theseapproachescannotbeusedforinternalerrorsastheyareconcernedwiththeexecutionofthesubjectprogrambutnotthepreciseoutput.Testgenerationforoating-pointprograms.Testinputgen-erationisanimportantresearchtopicandhasbeenapproachedindifferentangles[22],[31],[32].Atypicalapproach[33]istousesymbolicexecutiontoexploredifferentpathsandgeneratetestinputsbysolvingpathconstraints.However,constraintswithoating-pointoperationsareusuallydifculttosolve.MillerandSpooner[7]rstproposetheuseofsearch-basedtechniquesinsteadofsymbolicexecutiontogeneratetestinputdata.Recently,Bagnaraetal.[8]proposetouseseveralsearchheuristicstoenhanceconstraint-solvinginconcolictestingforoating-pointprograms.However,theirgoaloftestgenerationistoincreasepathcoverage,butnottodetectoating-pointinaccuracies.Asfarasweareaware,ourapproachisthersttestgenerationapproachtothedetectionofoating-pointinaccuracies.Besidestestinput,testoraclegenerationisalsoanimportantproblemintestautomation.Recently,Zhangetal.[34]proposetoinfermetamorphicrelationsforregressiontesting.However,theirapproachonlyworksforregressiontestingbutnotforinitialtesting.VIII.CONCLUSIONInthispaperwehaveshownthat,withtherecentadvanceinthedynamicanalysisofoating-pointerrors,ithasbecomepossibleforsearch-basedtestgenerationaimingatmaximizingthelikelyerrors.Byexploitingthestatisticalcharacteristicsoflargeoating-pointinaccuracies,wehavedesignedaspecializedgeneticalgorithminordertoefcientlysearchforlargeinaccuraciesinnumericalprograms.Ourexperimentalresultsdemonstratethatourapproachisabletondmanylargeinaccuraciesinawidely-usedlibrary,indicatingthepronenessofnumericalprogramstolargeinaccuraciesinpractice. REFERENCES[1]E.DarulovaandV.Kuncak,“Trustworthynumericalcomputationinscala,”inProc.OOPSLA,2011,pp.325–344.[2]S.Putot,E.Goubault,andM.Martel,“Staticanalysis-basedvalidationofoating-pointcomputations,”inNumericalsoftwarewithresultverication,2004,pp.306–313.[3]E.GoubaultandS.Putot,“Staticanalysisofnumericalalgorithms,”inProc.SAS,2006,pp.18–34.[4]T.Hickey,Q.Ju,andM.H.VanEmden,“Intervalarithmetic:Fromprinciplestoimplementation,”JournaloftheACM,vol.48,no.5,pp.1038–1068,2001.[5]J.StolandL.DeFigueiredo,“Anintroductiontoafnearithmetic,”TEMATend.Mat.Apl.Comput.,vol.4,no.3,pp.297–312,2003.[6]T.BaoandX.Zhang,“On-the-ydetectionofinstabilityproblemsinoating-pointprogramexecution,”inProc.OOPSLA,2013,pp.817–832.[7]W.MillerandD.L.Spooner,“Automaticgenerationofoating-pointtestdata,”IEEETransactionsonSoftwareEngineering,vol.2,no.3,pp.223–226,1976.[8]R.Bagnara,M.Carlier,R.Gori,andA.Gotlieb,“Symbolicpath-orientedtestdatagenerationforoating-pointprograms,”inProc.ICST,2013,pp.1–10.[9]F.Benz,A.Hildebrandt,andS.Hack,“Adynamicprogramanalysistondoating-pointaccuracyproblems,”inProc.PLDI,2012,pp.453–462.[10]E.Tang,E.T.Barr,X.Li,andZ.Su,“Perturbingnumericalcalculationsforstatisticalanalysisofoating-pointprogram(in)stability,”inProc.ISSTA,2010,pp.131–142.[11]W.Kahan,“IEEEstandard754forbinaryoating-pointarithmetic,”LectureNotesontheStatusofIEEE,vol.754,no.94720-1776,p.11,1996.[12]E.Falkenauer,Geneticalgorithmsandgroupingproblems.JohnWiley&Sons,Inc.,1998.[13]N.J.Higham,Accuracyandstabilityofnumericalalgorithms.Siam,2002.[14]B.D.McCulloughandH.D.Vinod,“Thenumericalreliabilityofeconometricsoftware,”JournalofEconomicLiterature,pp.633–665,1999.[15]S.Zhao,“Acalculatorwithcontrollederror,examplesection(inChinese),”2015,[Accessed12-February-2015].[Online].Available:http://www.zhaoshizhong.org/download.htm[16]D.Goldberg,“Whateverycomputerscientistshouldknowaboutoating-pointarithmetic,”ACMComput.Surv.,vol.23,no.1,pp.5–48,Mar.1991.[17]D.B.Fogel,Evolutionarycomputation:towardanewphilosophyofmachineintelligence.JohnWiley&Sons,2006,vol.1.[18]L.D.Whitleyetal.,“Thegenitoralgorithmandselectionpressure:Whyrank-basedallocationofreproductivetrialsisbest.”inICGA,vol.89,1989,pp.116–123.[19]W.-F.Chiang,G.Gopalakrishnan,Z.Rakamaric,andA.Solovyev,“Efcientsearchforinputscausinghighoating-pointerrors,”inPPoPP.ACM,2014,pp.43–52.[20]E.T.Barr,T.Vo,V.Le,andZ.Su,“Automaticdetectionofoating-pointexceptions,”inProc.POPL,2013,pp.549–560.[21]J.J.Grefenstette,“Optimizationofcontrolparametersforgeneticalgorithms,”Systems,ManandCybernetics,IEEETransactionson,vol.16,no.1,pp.122–128,1986.[22]P.McMinn,“Search-basedsoftwaretestdatageneration:asurvey,”Softwaretesting,Vericationandreliability,vol.14,no.2,pp.105–156,2004.[23]E.Schkufza,R.Sharma,andA.Aiken,“Stochasticoptimizationofoating-pointprogramswithtunableprecision,”inPLDI,2014,pp.53–64.[24]M.O.Lam,J.K.Hollingsworth,B.R.deSupinski,andM.P.Legendre,“Automaticallyadaptingprogramsformixed-precisionoating-pointcomputation,”inICS,2013,pp.369–378.[25]C.Rubio-Gonz´alez,C.Nguyen,H.D.Nguyen,J.Demmel,W.Kahan,K.Sen,D.H.Bailey,C.Iancu,andD.Hough,“Precimonious:Tuningassistantforoating-pointprecision,”inSC,2013,pp.27:1–27:12.[26]S.BoldoandJ.-C.Filliˆatre,“Formalvericationofoating-pointprograms,”inProc.ARITH,2007,pp.187–194.[27]S.BoldoandG.Melquiond,“Flocq:Auniedlibraryforprovingoating-pointalgorithmsincoq,”inProc.ARITH,2011,pp.243–252.[28]A.AyadandC.March´e,“Multi-proververicationofoating-pointprograms,”inProc.IJCAR,2010,pp.127–141.[29]S.Chaudhuri,S.Gulwani,R.Lublinerman,andS.Navidpour,“Provingprogramsrobust,”inESEC/FSE'11,2011,pp.102–112.[30]T.Bao,Y.Zheng,andX.Zhang,“Whiteboxsamplinginuncertaindataprocessingenabledbyprogramanalysis,”inOOPSLA'12,2012,pp.897–914.[31]T.Xie,L.Zhang,X.Xiao,Y.Xiong,andD.Hao,“Cooperativesoftwaretestingandanalysis:Advancesandchallenges,”JournalofComputerScienceandTechnology,vol.29,no.4,pp.713–723,2014.[32]D.Hao,L.Zhang,M.Liu,H.Li,andJ.Sun,“Test-datagenerationguidedbystaticdefectdetection,”JournalofComputerScienceandTechnology,vol.24,no.2,pp.284–293,2009.[33]J.C.King,“Symbolicexecutionandprogramtesting,”CommunicationsoftheACM,vol.19,no.7,pp.385–394,1976.[34]J.Zhang,J.Chen,D.Hao,Y.Xiong,B.Xie,L.Zhang,andH.Mei,“Search-basedinferenceofpolynomialmetamorphicrelations,”inASE,2014,pp.701–712.

Related Contents


Next Show more