/
rite rite

rite - PDF document

danika-pritchard
danika-pritchard . @danika-pritchard
Follow
403 views
Uploaded On 2016-07-24

rite - PPT Presentation

498 504 499 500 503 502 508 507 497 501 505 506 REWINDRe coveryW n ID: 417613

498 504 499 500 503 502 508 507 497 501 505 506 REWIND:Re coveryW n

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "rite" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

498 504 499 500 503 502 508 507 497 501 505 506 REWIND:Re coveryW rite­AheadSystemforI n­MemoryN on­VolatileD ata­StructuresAndreasChatzistergiouUniversityofEdinburgh,UKa.chatzistergiou@sms.ed.ac.ukMarceloCintraIntel,Germanymarcelo.cintra@intel.comStratisD.ViglasUniversityofEdinburgh,UKsviglas@inf.ed.ac.ukABSTRACTRecentnon-volatilememory(NVM)technologies,suchasPCM,STT-MRAMandReRAM,canactasbothmainmemoryandstorage.ThishasledtoresearchintoNVMpro-grammingmodels,wherepersistentdatastructuresremaininmemoryandareaccesseddirectlythroughCPUloadsandstores.Existingmechanismsfortransactionalupdatesarenotappropriateinsuchasettingastheyareoptimizedforblock-basedstorage.WepresentREWIND,auser-modelibraryapproachtomanagingtransactionalupdatesdirectlyfromusercodewritteninanimperativegeneral-purposelanguage.REWINDreliesonacustompersistentin-memorydatastructureforthelogthatsupportsrecover-ableoperationsonitself.Theschemealsoemploysacombi-nationofnon-temporalupdates,persistentmemoryfences,andlightweightlogging.ExperimentalresultsonsynthetictransactionalworkloadsandTPC-CshowtheoverheadofREWINDcomparedtoitsnon-recoverableequivalenttobewithinafactorofonly1:5and1:39respectively.More-over,REWINDoutperformsstate-of-the-artapproachesfordatastructurerecoverabilityaswellasgeneralpurposeandNVM-awareDBMS-basedrecoveryschemesbyuptotwoordersofmagnitude.1.INTRODUCTIONNon-volatilememory(NVM)technologies,suchasPCM,STT-MRAMandReRAM,raisetheprospectofpersistentbyte-addressablerandomaccessmemorywithlargeenoughcapacitytodoubleasstorage.Byitselfthiswouldallowap-plicationstostoretheirpersistentdatainmainmemorybymountingaportionofthe lesystemtoit.ThisintroducesNVMintothedatamanagementprogrammingstack,butinafarfromidealmanner.Consideratypicalmulti-tierap-plication:theprogrammerdecidesontheapplication-levelcontrolanddatastructures,andthendecidesonthestorage-levelpersistentrepresentationofthedatastructures.Spe-cializedAPIs(e.g.,embeddedSQL,JDBC,etc.)translateThisworkislicensedundertheCreativeCommonsAttribution­NonCommercial­NoDerivs3.0UnportedLicense.Toviewacopyofthisli­cense,visithttp://creativecommons.org/licenses/by­nc­nd/3.0/.Obtainper­missionpriortoanyusebeyondthosecoveredbythelicense.Contactcopyrightholderbyemailinginfo@vldb.org.Articlesfromthisvolumewereinvitedtopresenttheirresultsatthe41stInternationalConferenceonVeryLargeDataBases,August31st­September4th2015,KohalaCoast,Hawaii.ProceedingsoftheVLDBEndowment,Vol.8,No.5Copyright2015VLDBEndowment2150­8097/15/01.databetweenthetworuntimesusingSQLastheintermedi-atelanguage,inacumbersomeandsometimeserror-proneprocess.Moreover,datamaybereplicatedinbothDRAMandNVM,whilethebyte-addressabilityofNVMisnotlever-aged.Clearly,thisissuboptimal.Analternativeistoportanin-memorydatabasesystemtopersistentmemory.Thatwouldmakeuseofbyteaddressability,butitwouldstillre-quiredatabereplicatedandrepresentedintwodatamodels.Wearguethatweneedasolutionthatisnotintrusivetotheprogrammerandseamlesslyintegratestheapplication'sdatastructureswiththeirpersistentrepresentation.Wetargetuse-caseswherethedataownerhasfullcontrolofthedata,foreseeslittlechangetotheschema,andwouldliketotightlyco-designtheschemawiththeoperationsforperformance.Theseuse-casescapturealargegroupofcon-temporaryapplications.Indeed,persistenceAPIsarebeingusedacrossavarietyofoperatingsystems[1,18].Theseplatformssupporteitherapersistentstoragemanager[22]oranembeddeddatabase[16].Ourstanceisthatinsuchsce-narioswearebettero integratingthestoragemanagerandtheapplicationmemoryspaces.Bydoingsoweenabletheuseofarbitrarypersistentdatastructuresinalightweightsoftwarestackthatsigni cantlyreducesthecostofman-agingdata[15,27].ToaddresstheseissuesweintroduceREWIND:auser-modelibrarythatenablestransactionalrecoverabilityofanarbitrarysetofpersistentupdatestomainmemorydata.TheruntimesystemofREWINDtrans-parentlymanagesalogofprogramupdatestothecriticaldataandhandlesbothcommitandrecoveryoftransactions.Bytrackingtheoperationsofthetransactionsthatcommittheruntimecanidentifythepointoffailureandcanresumeoperationrelyingontheconsistencyofcriticaldata.Ourworkstemsfromanalternativepersistentmemoryaccessmodelthathasgainedinterestrecently:directlyprogrammingwithpersistentmemorythroughmechanismssuchaspersistentregions[14]orpersistentheaps[5,31].Persistentdataisaccesseddirectlybytheprocessorwithaload-storeinterfaceandwith(mostly)automaticpersistencewithoutinteractingwiththeI/Osoftwarestack.Thismodelishighlydisruptiveasitenablesanewclassofdatamanage-mentsystemsinwhichbothuserdataanddatabasemeta-dataaremanagedentirelyinmemoryasonewouldmanagevolatiledata[29].Thus,somefundamentalassumptionsonprogramminginterfacesandsoftwarearchitectureneedtoberevisedaspersistentdataneedstobedirectlymanagedbytheprogrammer'scodeusingimperativelanguages.Theprogrammer-visibleAPIofREWINDo erstwomainfunctionalities:onetodemarcatethebeginningandendof transactionsandonetologupdatestocriticaldata.Ourintentionistocompletelydoawaywiththesecondfunction-alitybyrelyingoncompilersupportsothattheprogrammeronlyneedstoidentifythecriticaldata.Thekeychallengeofusingpersistentin-memorydatastructuresisguaranteeingconsistentdataupdatesinthepresenceofsystemfailures(e.g.,powerfailuresandsystemcrashes).Thisdi ersfromtransactionalmemorywherethemainintentionistodealop-timisticallyonlywithatomicityandisolation.Italsodi ersfromnon-recoverabledevicefailureasthisisanorthogonalissueandishardware-related;weonlytargetfailuresduetosystemandsoftwaremalfunctions.REWINDprovidesfullatomicityanddurabilityforpersistentmemorythroughwrite-aheadlogging.Whiletheprinciplesofthemechanismarestillapplicabletopersistentmemory,itsimplementationandtradeo smustberevisitedgiventhesigni cantdi er-encesinaccesslatenciesandsynchronizationcontrol(see,e.g.,[10,12]).Thus,REWINDovercomesanumberofchal-lengesthatariseinthisnewcontext.First,theprocessingmodelisthatpersistentdataisonbyte-addressableNVM,accessibledirectlyfromusercodethroughCPUloadsandstores.1Traditionally,dataupdatesare rstperformedinvolatilememory.Itisthuspossibletodelaymakinglogentriespersistentuntilthetransactioncommitsorthedataupdatesarepurgedfrommainmemory.InREWIND,updatesaredonedirectlyonNVMdata:thelogentriesmustbemadepersistentimmediately,andaheadofthedataupdates.Weachievethisthroughenhancedver-sionsofmemoryfences(i.e.,barriersthatenforceorderingandpersistencetoprecedinginstructions),cacheline ushesandnon-temporalstores(i.e.,directtoNVMstoresthatbypassthecache)withpersistenceguarantees.REWINDusesphysicalloggingasit tsbetterwithim-perativelanguagesandallowseasiercompilersupport.How-ever,itmightresultinmorelogrecordsthanlogical/phys-iologicalloggingwhenmemoryblocksareshiftedinmem-ory.Then,thelogitselfmustbemanipulatedatomicallyinarecoverableway.Traditionally,thelogismaintainedinvolatilememoryandpushedtopersistentstoragethroughsystemcalls.InREWIND,thelogitselfresidesinpersis-tentmainmemoryandupdatesaremadein-place.Trans-actionalhandlingoffailureoflogupdatesisattainedwithcarefullycrafteddatastructuresandcodesequences.Fur-thermore,performanceisrelativetoabaselinewiththelowcostofindividualmemoryoperations.Thus,loggingmustbeoptimizedtoincuronlyasmallincreaseinthecostofamemoryoperation.InREWIND,weguaranteethiswithminimalistdatastructuresandcodesequences.Moreover,whilecontemporarysystemso errecordlevellockingfromadata-centricperspective,theyusecoarse-grainedpage-levellatchinginternally.REWINDemploys ne-grainedlatch-ingatalogrecordgranularity:thisenablesmoreecientand exiblelockingmechanisms.Finally,themajorityofrecoverymanagersbasedonARIES[20]areimplementedwithinDBMSs.Thus,theyhidedatamanagementbehindsomedatamodel(e.g.,relational)andallowdatamanip-ulationthroughawell-de nedquerylanguage(e.g.,SQL).REWINDisimplementedasauser-modelibrarythatcanbelinkedtoanynativeapplication,givingtheprogrammerfull 1NAND-basedbattery-backedNV-DIMMsalreadysup-portthis(seealsohttp://www.smartstoragesys.com/pdfs/ULLtraDIMM_overview.pdf).Newertechnologieswillbringmorepracticalimplementations. Figure1:Transactionalaccesstoapplicationdata.accesstothedatausinganarbitrarysequenceofimperativecommands.Moreover,thedesignofREWINDitselfissuchthatitcanbestraightforwardlyembeddedintothecompilersothatthedisruptiontousercodeisfurtherminimized.ContributionsandorganizationOurcontributionsandthestructureoftherestofthispaperareasfollows:WeintroduceREWIND,auser-modelibraryforlog-gingandtransactionmanagementinNVM.WeexplorethedesignspaceofREWIND(Section2)viafourcon gurationsthatresultfromchoosingbe-tween:(a)twodi erentlogimplementationsopti-mizedforminimizingeitherloggingoverheadorsearchspeed;and(b)forcingornotuserdatatonon-temporalstores.Wedescribetwowaystoimplementthelogcompletelyinpersistentmemory,soitisrecoverableandatomicaswell.Wethenpresenttwooptimizedlogversionsthatfurtherreducethewriteoverhead(Section3).Byleveragingtherecoverablelogweshowhowtoenablein-memorypersistentdatastructures.Wepresenthowthesenewmechanismscanbeincor-poratedintoimperativegeneral-purposelanguagesthroughtheREWINDlibraryandruntime(Section4).WeanalyzethesensitivityofREWINDtoitsparam-etersandshowhowitcanbecon guredtodeliverlow-overheadtransactionalprocessingandrecoverabil-ityofdatastructuresinNVM.WecompareREWINDtostate-of-the-artrecoverymanagersaswellastotraditionalandNVM-awareDBMS-basedtechniques.REWIND'soverheadiswithinafactorof1:5fromitsnon-recoverablecounterpart,whileitoutperformsthecompetitionbyuptotwoordersofmagnitude(Sec-tions5.1and5.2).Weuseamodi edversionofTPC-CtoshowhowREWINDenablestheco-designofalgorithmsanddatastructures.Workload-andprogram-speci coptimiza-tionsresultinaREWINDperformancewithinafactorof1:3fromitsnon-recoverableversion(Section5.3).Finally,wepresentrelatedworkinSection6andconcludeandidentifyfutureworkdirectionsinSection7.2.SYSTEMOVERVIEWREWINDisauser-moderecoveryruntimesystemthatcanbeusedbyprogrammersandcompilerstoprovideatomicrecoverabilitytoarbitrarycodethatoperatesonper-sistentdatastructuresinNVM.WeenvisiontheREWINDlibrarybeingstaticallylinkedwithexecutables,butothervariationssuchasdynamiclinkingorasharedlibrarycouldalsobedeveloped.Thus,REWINDcanbeusedthenasastandalonerecoverymanagerforindividualapplications,or Scheme Pros Cons TransactionalFS portability programmability scalability  exibility DBMS robustness initialperf.cost scalability  exibility REWIND programmability disruptivemodel initialperf.cost Table1:ProsandconsoftheoptionsofFigure1.couldbeusedasthebuildingblockofalarger,multi-userdatamanagementsystem.WeviewREWINDasafunda-mentalbuildingblocktowardsintroducingpersistenceatthesystemlevel,especiallywhendataoutlivesapplications.InFigure1weshowtheoptionsavailablefortransac-tionaldatamanagement.TheapplicationcaninterfacewithaDBMS(ApplicationC),orwitha lesystemthato erstransactionalaccesstouserdata(ApplicationB)[7,28].ThelibraryapproachofREWIND(ApplicationA)operatesdirectlyondatastoredinNVM.ThisrequiresanNVM-awarememorymanagerintheOS[5,31].Table1summa-rizesthehigh-levelprosandconsofeachoption.Theuseofa lesystemleadstoportabledataformats,butsu ersintermsofprogrammabilityand exibilitybyexpectingtheprogrammertomanagebothanin-memoryandaserializedon-diskversionofthedata.TheDBMSisatriedapproach,butlimits exibilitybyimposingadatamodelandqueryAPI,andsu ersfromtheoverheadsofaclient-serverar-chitectureandahighlycomplexserver.REWINDo ersincreasedprogrammabilitybyenablingin-memorypersis-tentdatastructuresandAPIsaswellasthelowoverheadofauser-modelibrary.REWINDcurrentlytargetsindivid-ualapplicationsandmayrequireadditionalfunctionalityforoperationinmulti-userenvironments.Itprovides,however,acriticalmechanismtoenablethisnewclassofdataman-agementinNVM.Othershavealsoarguedforsupportingavarietyofstoragemodelsandinfrastructurestomeetthedemandsofdi erentworkloads(e.g.,[27]).ThecoreofREWINDisatransactionalrecoveryprotocolbasedonWAL(Write-AheadLogging).UnliketheARIESimplementationsofDBMSs,REWINDprovidesprogram-merswithdirectcontrolofwhatupdatesshouldbetrans-actionalthroughasimpleAPIwithaconstructtomarktransactionsandafunctioncalltothelogoperation.Log-gingcallsarecurrentlyinsertedmanuallybytheprogram-mer,butweexpectthemtobeinsertedtransparentlybythecompiler,similarlytohowSoftwareTransactionalMemory(STM)compilerswork[11,32].Listing1isafunctiontore-moveanelementfromadoubly-linkedlistwiththestateofthelistupdatedthroughCPUwrites(lines3to6).Tomaketheoperationrecoverable,weenclosethecriticalupdatesinapersistentatomicblock.1voidremove(node*n){2persistentatomic{3if(n==tail)tail=n-�prv;4if(n==head)head=n-�nxt;5if(n-�prv)n-�prv-�nxt=n-�nxt;6if(n-�nxt)n-�nxt-�prv=n-�prv;7delete(n);}}//endofatomicblockListing1:Removalfromadoubly-linkedlist.Toprovideprogrammerswiththefamiliaraccessinterfaceofcurrentmainmemorydata,wearerestrictedtophysi-calloggingandin-placeupdates.Thisdi ersfromtradi-tionaldisk-basedsystemswherewearefreetouselogicallogsordelayforcinguserupdatestoimproveperformance.AsREWINDisbasedonwrite-aheadlogging,allupdatestopersistentdatamustbeprecededbyacalltothelogfunction.ThisseparatesdatafromthelogasthedataarehandledbytheuserprogramwhilethelogishandledbyREWIND.Theresultingcode,asexpandedbytheprogram-merorcompiler,isshowninListing2.Theruntime'strans-actionmanageriscalledatthestartoftheblock(line2)tocreateanewtransactionidenti er:transactionmanagementistransparenttotheprogrammerandcompiler.Loggingcalls(e.g.,line4)precedeeverycriticalupdate(e.g.,line5).Loggingcallparametersincludethetransactionidenti er,theaddressofthememorylocationbeingupdated,andthepreviousandnewvalues2.Attheendoftheexpandedcodethecommitcallmarkstheendofthepersistentatomicblock.Thede-allocationofthememoryoccupiedbytheremovednodemustbeplacedaftertransactioncommit(line16):withoutadditionalOSsupport,de-allocatingmemoryisanoperationthatcannotbeundonebyREWIND.1voidremove(node*n){2inttID=tm-�getNextID();3if(n==tail){4tm-�log(tID,&tail,tail,n-�prv);5tail=n-�prv;}6if(n==head){7tm-�log(tID,&head,head,n-�nxt);8head=n-�nxt;}9if(n-�prv){10tm-�log(tID,&n-�prv-�nxt,n-�prv-�nxt,n-�nxt);11n-�prv-�nxt=n-�nxt;}12if(n-�nxt){13tm-�log(tID,&n-�nxt-�prv,n-�nxt-�prv,n-�prv);14n-�nxt-�prv=n-�prv;}15tm-�commit(tID);16delete(n);}Listing2:ExpandedcodeforListing1.Weproposeandevaluatefourcon gurationsofloggingandtransactionmanagementinREWINDthroughdeciding:(a)whetherornottoforceuserupdatestoNVMastheyhappen;and(b)thenumberoflogginglayerstoemploy(oneortwo).Eachcon gurationcomeswithitsowntradeo s.Forcing/notforcinguserupdatesAforcepolicyslowsdownloggingduetotheextratimeneededtoguaranteethepersistenceoftheupdate.However,itonlyrequiresatwo-phaserecovery(analysisandundo)insteadofthethree-phaserecovery(analysis,redo,andundo)oftheno-forcepolicy.Thus,thetradeo isfasterrecoveryoveraslightslowdownduringlogging.Theforcepolicyalsoal-lows(withoutdictating,however)analternativelogclear-ingmethodinsteadofcheckpoints.Eachtransactioncanclearitsownrecordsimmediatelyaftercommit,resultinginslowercommitsbuteliminatingcheckpoints.Logclearingbecomesmoreexpensiveasthenumberofconcurrenttrans-actionsattemptingtocommitgrowsthroughincreasedlock-ingcongestion:clearingrequirescoarser-grainedlocksthanaddingrecordsasitinvalidatestheiteratorsofconcurrentthreads.However,itutilizesmemorybetter:memoryisde-allocatedrightaftercommitandnotafterthecheckpoint.Italsominimizesthesizeofthelog,whichimprovesthetimeto ndrecordsofatransaction.Inourimplementationwecombinetheforcepolicywithlogclearingatcommit-time. 2Byaddressofalocationwemeanapersistentvirtualad-dress,e.g.,thato eredby[31],arelativeaddress,orsomeotherformofpersistentreferencetothememorylocation. NumberoflogginglayersThelogginginfrastructureandtherecoverymanagero ertwocon gurationsofthelogdatastructure.Initssimplestformthelogisaspeciallycraftedrecoverablepersistentdoubly-linkedlist.Alterna-tively,thelogdatastructureisorganizedintwolayers:anauxiliarydatastructure(anAVLtree)atthetoplayer,overtherecoverablepersistentdoubly-linkedlist.Thus,there-coverymanagercatersfortherecoveryofuptothreeel-ements:programmertransaction,anoptionalcomplexlogstructure,andafundamentalandsimpledatastructure.Re-coverystartsbyrecoveringthesimpledatastructuretoaconsistentstate,whosecontentsarethenusedtorecovertheauxiliarylogstructure,ifthereisone.Thecontentsofbothprimaryandauxiliarylogstructuresareusedtore-covertheupdatesoftheprogrammertransaction.Themaintradeo betweenthesetwovariantsisthatloggingisfasterintheone-layercasebutthetwo-layerlogstructureo ersfastersearchwhichbene tsrollingtransactionsback.3.THERECOVERABLELOG3.1DesignoverviewInNVM,thelogisitselfanin-memorynon-volatiledatastructureandlogupdatesrequireaseriesofCPUwrites.Thus,updatesforloggingandrecoveringuserupdatesmustthemselvesbeloggedandrecoverable.Thechallengeistoensureatomicityanddurabilityforlogupdatesusingarecoverymechanism.Notethatuserupdatesarenowmuchcheaper,whichmakestraditionallogginginfrastruc-tureheavyweightandcallsforalowoverheaddesign.DBMSsoftenuseauxiliarydatastructuresforindexingthelog.Updatingsuchstructuresmayrequireavariablenumberofupdatesduetotheneedtore-organizethedatastructure.ThismakesmaintainingtransactionalatomicitydicultinNVM.Oursolutionistocreateaspecializeddatastructurethatembedstransactionallogicandisabletore-coveritselfinthecaseofasystemfailure.Thedatastructurerequiresaconstantnumberofoperationstoinsertorremoveanentry,sothatitsstatecanbetrackedwithonlyafewvari-ablesthatcanbeupdatedandmadepersistentinasingle,atomic,CPUwrite.Thisreliesononlythelastoperationinthebasicstructurebeingpending,soweonlyneedtologoneoperation.Wealsorequiretheappend/removeopera-tionstobethread-safe.WeforceallupdatesonthebasicdatastructuretobeperformeddirectlyonNVMthrough:memoryfencestoforcependingwrites;non-temporal,syn-chronous,writesthatbypassthecacheanddonotcompletebeforereachingNVM;andcacheline ushes.Allprimitivesarepresentinmostinstructionsetstoday,buttheyguar-anteeonlywritevisibilitywithinthememoryconsistencymodelofthemachine;theydonotguaranteepersistence.WeassumethatwhenNVMsystemsbecomewidelyavail-abletheywillbecapabletoalsoguaranteepersistencetoNVM.MostworkonpersistentdatastructuresforNVMmakessimilarassumptions(e.g.,[5,10,12,31]).Basedontheserequirementsweuseadoubly-linkedlistasthebasiclogdatastructure.Listnodescontainthelogrecords,whichcontaininformationalsofoundinARIES,e.g.,atransactionidenti erandtheoldandnewvaluesofthea ectedmemorylocation,etc.Withcarefullycraftedcodeitispossibletomakeatomicnodeinsertionsanddele-tionsoverthelinkedlist.ThisAtomicDoubly-LinkedList(ADLL)isthekeydatastructureforlogginguserupdates.However,itrequireslinearsearchtolocateanentry.Anal-ternativeistoindexlogrecordsbytransactionidenti erandusetheADLLtologthependingupdatestotheindex.Theresultisatwo-layercon gurationwheretheindex(anAVLtreeinourcase)logspendinguserupdatesandthebasicdatastructure(i.e.,theADLL)logspendingREWINDup-datestothecomplexdatastructure.Theone-layercon gu-rationo ersfasterloggingbutmayleadtoslowerrollback;andvice-versaforthetwo-layercon guration.3.2One­layerlogging:theAtomicDoubly­LinkedListAssumingthatrecoveryandrollbackarerareevents,wecanachievefasterloggingatthecostofaslowerretrievaloflogentries.TheonlyloggingstructureistheADLL,soin-sertingarecordcostsasmallconstantnumberofwrites.Weoptimizeloggingattheexpenseofmoreworkduringrecov-ery.Wedosobynotkeepinganytransaction-speci cstate.Atthepriceofahigherrollback/recoverycostweeliminatethetransactiontableduringloggingandreducethenumberofvariablesweupdate;weonlyconstructthetransactiontableduringrecovery.Thisdeparturefromback-chainingisacceptableasweexpectrollback/recoverytoberareevents.Insteadofrollingbackonetransactionatatime,weper-formasinglebackwardscanofthelogandrecoveralltrans-actions,attheexpenseofhighermemoryutilization(seeSection4.5).Rollingbackasingletransactionisnottypicalinsystemfailures,butselectiverollbackisnecessarytoallowuserstoabortspeci ctransactions.Toachievethisweneedtoscantheentirelogjustfortherolledbacktransaction.Long-runningtransactionsexacerbatethis,asdothenum-berofconcurrenttransactions:theyincreasethenumberofrecordsbetweenrecordsofthetransactionbeingrolledbackthatweneedtoskip.Torectify,weclearthelogatcheck-points(Section4.6):bytuningthecheckpointingfrequencywebalancetheinsertionoverheadagainsttherollbackspeed.TheADLLisakeystoneofREWINDasitenablestheatomicinsertionandremovaloflogrecordsinto/fromtheloginNVM.TheADLLisrecoverableitselfthrough:(a)useofsinglevariablestologtheinternalstate,whichcanbeupdatedatomicallyinhardware;(b)recoverybyredoingonlythelastoperation:repeatedredos,eitherpartialorinfull(duetofurthersystemfailuresinthemiddleofanADLLrecovery),aresafeandleavethelistinacorrectstate;(c)simpleoperationsthatmakeiteasiertoproducecodewiththeredorecoverabilityproperty;and(d)perform-ingallwritesvianon-temporalstores.TheADLLusesfourloggingvariables:lastTail,thetailofthelistbeforein-sertion;toAppend,thenodetobeappended;andtoRemove,thenodetoberemoved.Eachlistnodepointstothenextandpreviousnodes,andtotheactuallogrecord.Thelat-terissowecancreatenewrecords\o -line"andatomicallyinsert/appendthemtothelist.AppendThisoperationinvolvescreatingthenewnode,updatingthetail/headofthelist(ifneeded)andthenextpointerofthelasttail.TheoperationfortheADLLisshowninAlgorithm1.Lines5and10markthebeginningandend,respectively,ofthepersistentoperation.Line5correspondstothecriticalupdate:itsavesthenodetobeappendedsotheoperationcanberedoneduringrecovery.Ifthesystemfailsatanypointbeforeline5,thestateofthelistisnotaltered,andthusconsistent.Ifthesystemfailsafterline5,therecoveryoperation(describednext)willre-applythe Algorithm1:AppendoperationontheADLL,invokedaspartofthetransactionmanager'sloggingoperation. input:elementEtoinsert1//setupnewnode2n=newNode();n.element=E;n.prior=tail;3//undoinformation4lastTail=tail;//Keeptailbeforelogginglastinsertion5toAppend=n;6ifhead=NULLthenhead=n;//updatehead7iftail6=NULLthentail.next=n;//updatetail8tail=n;9//appendfinished,clearundo10toAppend=NULL; append.Line4isnotcriticalasitonlysetslastTailanddoesnotalterthestateofthelist.Ifthesystemcrashesbetweenlines4and5thisvaluewillbeoverwrittenbythenextappendattempt.Theorderoftheupdatesoflines4and5iscriticalforcorrectrecovery.Inline6theheadofthelistisupdated,ifnecessary.Thisisnotcriticalasitisdesignedsothatitcanberepeatedmultipletimesduringrecovery.Inlines7and8thenextpointerofthetailandthetailitselfareupdated.Ifthesystemfailsafterline10,thestateofthelistincludesthenewnodeandisconsistent.RecoveryduringappendWeusethetoAppendvariabletoidentifytheinterruptedaction:anon-NULLvalueimpliesanun nishedappendoperation.Thus,weneedtorepeatthecriticalsectionoftheappend.ToallowtherecoverycodetoberecoverableitselfweusethelastTailvariable,insteadoftailusedoriginally(line7ofAlgorithm1).Thisresolvestheproblemofacrashbetweenlines8and10ofAlgorithm1thatwouldcausethesecondrecoverytoreinsertthenode.RemovalToguaranteeatomicityandrecoverabilityofre-movalswefollowthesameprinciples.WestorethenodetoremoveinthetoRemovevariableatthebeginningofthecriticalsection,similarlytotoAppend.Torecover,werepeattheremovalcodewhichisdesignedtobesafelyre-executed.We rstcheckthetoRemovevariabletoidentifyifthesystemcrashedduringremovalandrepeattheprocess.ADLLrecoveryWerecoverby rstidentifyingthein-terruptedoperation(appendorremoval)bycheckingthetoAppendandtoRemovevariables.Then,werepeattheap-propriateoperationasdiscussed.3.3OptimizingthelogstructureAppendingarecordtotheADLLthroughAlgorithm1re-quiresmultiplenon-temporalstoresandbearsoverheadduetothewritelatencyandtheuseoffences.Moreover,writesrefertonon-consecutivelocations(thelist'snodes),whichforbidspackingthemtofewercachelines.Wecansigni -cantlyreducethewriteoverheadbychangingthememorylayoutofthedatastructurebyblockingmultiplerecordsinto xed-sizebucketsrepresentedasarrays,asshowninFigure2.Aftercreatingalogrecordweplaceitintoabucketwithonewrite,renderinginsertionbothatomicandcheap.Thelogisresizedbyatomicallyappendingnewbuck-etstotheADLL.Thislayoutusescheaparrayappendsandamortizesthecostofatomicexpansion:insteadofsinglenodes,weinsertbuckets.Therecoveryalgorithmsareunaf-fectedbythenewstructure.Theonlyexceptionisthatweneedtokeepthenextpositioninthelastbuckettoinsertanewrecord.Doingsothroughanon-temporalstorewouldincreasetheinsertioncost.Instead,wereconstructthein-formationduringtheanalysisphaseintheeventofacrash.Weinitializethecellsofeachbuckettozero,and,during Figure2:Minimizingthewriteoverhead.analysis,weidentifythelastoccupiedcellafterskippingallemptycellsclearedbythelogclearingprocess.ClearingthelogRemovinglogrecordsfromthehybridstructureismoreinvolvedduetotheneedtoshiftrecordsto llremovedrecordgaps.Doingthisatomicallyisex-pensiveandaddsunnecessarycomplexity.Weavoiditbyallowingmarkedgapsinabucket,keepingcountofoccupiedcells,andremovingabucketwhenitbecomesempty.Wedonotexplicitlystorebucketcounts,but,intheeventofacrash,wereconstructthemthroughthemarkedgaps.Wethussimplifyrecordremovalbutmaywastememoryinlong-runningtransactions.Underbothforcepolicies,therecordsoflong-runningtransactionscanspanmultiplebuckets,thuspreventingbucketremoval.Wecantunebucketsizetobal-ancetheimpactoflong-runningtransactions.Alternatively,wecancompactthelogifitsoccupancydropsbelowsomethresholdbycreatinganewlog,copyingrecordsover,andatomicallychangingthepointertotheheadbucket.MultiplelogrecordspercachelineAkeychallengeinkeepinguserdatainNVMisthelackofcontroloverwhentheirupdatesbecomepersistent,preventingDBMS-likeop-timizations[10,27]wherethelogtailis ushedfrommemorytopersistentstorageinbatches.Thisguaranteestheper-sistenceoflogrecordsandallowsthepackingofwritesincachelinesinNVM,butitisnotpossiblewhenuserdataisalsoinNVM:delayinglogwritesmaycausedatawritestoovertaketheirlogrecords,violatingtheWALprotocol.InREWIND,wecanperformsimilaroptimizationsoverourhybridlog.Multiplerecordsarepackedintoasinglecachelinesincetherecordpointersarestoredinconsecutivememorylocations.Thisdoesnotrequirethelogrecordsthemselvestobestoredtogetherinmemory.Thecompilerneedstoreorderthelogcallsandplacetheminbatchesabovethecorrespondinguserwrites.Thisguaranteesthelogwritesarenotovertakenbyuserwritesandrecordsareplacedinonecacheline.With64-bytecachelinesand8-bytepointersweneedjustasinglefenceandasinglenon-temporalstoreforeveryeightlogrecords.Thisalsomit-igatesthecostofthefenceandthegroupsizeservesasatuningknobforadjustingtodi erentfencelatencies.Commitlogrecordscanbereorderedsafelybeforetheuserwritesasthelogrecordsthatprecedethemguaranteerecoverability.Evenifthecommitrecordscannotbemoved,wecanmoveallprecedingrecordsandproceedasbefore.Thisrequiresthecachelinebewrittenatomicallysinceweonlyassumethehardwarecanguaranteesingle-wordatomicwrites.Wedothisbykeepingthepositioninthebucketuptowhichlogrecordsareguaranteedtobepersistent.Thisissettozerowhenthebucketiscreated.Then,itisupdatedafterweissueamemoryfence(usinganon-temporalstore)withthepositionofthelastrecord.Thisguaranteesthatalllogrecordsuptothatpointarepersistent.Ifacacheline isnotintentionally ushed,thisindexisnotupdated.Thisisvitalforensuringcorrectness,asduringrecoveryweonlyconsiderlogrecordsuptothelastpersistentindex.Weissueamemoryfence/indexupdateforeverybjcachelinej=jpointerjcrecords;orwhenthebucketisfull;orwhenwe ndanENDrecord.ThelatterisimportantsinceENDrecordsmarkthecompletionofcommit/rollback.Delayingtheupdateofthelastpersistentpositionafteracommitmayleadtohavingtoabortacompletedtransactionafteracrash.3.4Two­layerlogging:theAtomicAVLTreeToimprovesearchintheADLLweuseanauxiliarystruc-ture:anAtomicAVLTree(AAVLT),whichindexeslogrecordsbytheiridenti erandisrecoverablebymaintainingalogofitsinternaloperationsintheADLL.Themostinten-siveloggingactivityisduringrebalancingoninsertion/re-moval.WeusetheoptimizedversionoftheADLL.EveryupdatetotheAAVLTisonlyexecutedbyasinglethreadandforwardeddirectlytoNVM.DoingsoallowsustologonlythelastoperationontheAAVLTandclearthelogentriesaftercompletion,thusreducingthelengthoftheADLL.IntermsofAAVLTinsertionandremoval:(a)welogallthewritesthata ectthestateofthestructure,and(b)wedelaythede-allocationoftheremovednodesuntilafterthesuccessfulcompletionoftheoperation.Theloggingandre-coveryimplementationisasimpli edversionoftherecoveryschemeofSection4.Wealsoskiptheanalysisphaseasthereisonlyonetransactiontoundo.Rollbackalsolargelyre-mainsthesamewiththeonlyissuebeinglocatingthenextlogrecordtoundo.Thisisstraightforwardforanormalrollback(thepreviousentry),butafteracrashduringtherollbackoperationitselfweneedtoskipallrecordsthatwerepreviouslyundoneandcontinuefromthatpoint.Wedothisinthesamewayasintherecoveryofone-layerlogging(seeSection4.5).Finally,weclearlogentriesaftereachAAVLToperationaswedescribeinSection4.6fortheforcepolicy.4.THERECOVERYRUNTIME4.1TransactionrecoverymanagementThetransactionrecoverymanagermaintainstwostruc-tures:thelogandthetransactiontable.Thelogtrackstheprogram'swrites.Theformatoftheserecordsisstan-dardandincludestherecordID,thetransactionID,therecordtype,theoldandnewvalues,theaddressofthemem-orylocationmodi ed,andpointerstootherrecords.Thetransactiontablestoresinformationabouttheactivetrans-actions.TransactiontableentriesincludethetransactionID,itsstatus,theIDofthelastrecordofthetransaction,andtheIDoftherecordtoundonext.Thetransactionta-bleisconstructedduringrecoveryinallcon gurationsbutismaintainedduringlogginginthetwo-layercon guration.ThereisnoneedforadirtypagetableasNVMsarebyte-addressable.Thetransactionrecoverymanagerconstructsthetransactiontableatapplicationstartanddetermineswhetherasystemorapplicationcrashoccurred,inwhichcaserecoveryisperformed,orwhetherthisisacleanstart.4.2LoggingUndertheWALprotocolalogrecordmustbepersistedbeforethecorrespondingpersistentwrite.Weusethisap-proachfortheADLL,theAAVLTinthetwo-layercon gura-tion,andtheprogrammerdata(Section2).Weusephysicallogginginsteadoflogicalloggingasit tsbetterwithimpera-tivelanguages.DBMSsenforceWALwithsystemcallsandasynchronousI/Ointerface.InREWIND,wemustenforceWALforCPUwritesthatpassthroughacomplexmemoryhierarchyandmaybere-orderedbeforereachingNVM.Weuseasimpli edversionoftheoriginalARIESlogfunctionwiththekeydi erencebeingthatthedirtypagetableisabsent,aswedonothavepages.Forone-layerloggingthetransactiontableisalsoabsentduringloggingandonlyre-constructedduringrecovery.Logrecordsarecreatedgivenappropriateparametersandthenamemoryfenceisissuedtoensuretherecord eldshavereachedthememory.Afterthat,therecordisinsertedatomicallytothelog.Iftwo-layerloggingisused,therecordisinsertedintotheAAVLTandtheAAVLTmaintenanceoperationsareloggedinstead.AswediscussedinSection3.3wecanreducethenumberoffencesrequiredbymovinggroupsoflogrecordsbeforethewritesandthenissuingasinglefence.4.3CommitThelogfunctionguaranteesthattherelevantlogrecordsareinNVMuponcommit.Underaforcepolicy,allupdatesofatransactionmustbeinNVMbythetimeatransactioncommits.WedothisbydirectlyupdatingNVMusingnon-temporalstoresandfollowit,atcommit-time,withamem-oryfenceandanENDlogrecord.Wemayalsothenremovethelogentriesofthistransaction.Inno-forcecon gurations,allweneedistoinserttheENDlogrecordatcommit-time.Thelogentriesofcommittedtransactionsareclearedinthebackgroundbycheckpointing(aswewillseeinSection4.6).ARIESfollowsano-forcepolicytoimproveI/Owhenwrit-inglogpagesanddirtypagestodisk.InNVM,persist-ingthelogentriesisasexpensiveasmakingtheupdatesthemselvespersistent.ARIESusesastealpolicy,whichinourcaseisinapplicableasthereisnobu er-pool.Commit-tinginARIESexplicitlyforcesanyin-memorylogentriestopersistentstorage.ThisisnotrequiredinREWINDaslogentriesareimmediatelymadepersistent(throughnon-temporalstores).ThisisanovelrequirementinNVM-basedsystemstopreventreorderingofwritesinthememoryhier-archyfrombreakingtheWALprotocol.Memoryde-allocation(e.g.,line7ofListing1)requiresspecialhandlingforrecoverability.Inno-forcecon gura-tions,wedelaymemoryde-allocationuntilthecorrespond-inglogentryisprocessedatthenextcheckpoint(seeSec-tion4.6).Thede-allocationdetailsarestoredinaspecialDELETErecord.Inforcecon gurations,wepostponemem-oryde-allocationuntilaftercommitting(asinline16ofListing2).WealsorelyonaDELETErecordtohandleasystemfailurebetweencommitandtheactualde-allocation.4.4RollbackTransactionrollbackinREWIND(eitherexplicitlyorasaresultofasystemfailure)proceedsasfollows.Inone-layerlogging,rollbackisatrivialbackwardscan.Thesituationismorecomplicatedintwo-layerlogging,whereweselectivelyscanthelogforthetransactionbeingrolledbackthroughtheAAVLT.TherollbackcanberepeatedanunlimitednumberoftimesthroughtheuseofCLRsthatlogundooperations.Asweusephysicallogging,undosetsavariabletoitsoldvalue.Notethatundertheforcepolicytheundosshouldbemadepersistentaswell.Thisisrequiredtobeabletoclearthelogsaftertherollback.Onecomplicationisthatweneed toredothelastCLRwhenwerecoverfromacrashedroll-back.ThisprotectsfromthecornercaseofacrashafterthecreationofthelastCLRbutbeforethecorrespondingup-datewasmadepersistent.Finally,wemarkthesuccessfulrollbackcompletionbywritinganENDlogrecord.4.5RecoveryTorecover,wemust rstrecoverthelogitself.Thisisfollowedbyeitherthreephases(analysis,redo,andundo)ortwophases(analysisandundo)dependingontheforce/no-forcecon guration.ARIESandDBMSsexploittheI/Osubstratetopresentaconsistentandpersistentlogstructureincaseofsystemfailuresduringlogwrites.InourcasethelogisinNVM.Thus,werequirecustommechanismstorecoverfrominterruptedlogupdates(seealsoSection3).Whenrecovery nishes,wealsoclearthetransactiontableasalltransactionsarehenceforthconsideredcompleted.Afterrecoveringthelog,theanalysisphasereconstructsthetransactiontablebyscanningthelogforwardtothepointoffailure.Then,intheno-force/three-phasecon g-urationonly,wescanthelogforwardagainandredoallwrites.Theredophasehandlesacrashduringapreviousrollback,asitensuresthatallundosareredoneandconse-quentlynotlostduringthesecondcrash.Inthethirdphaseweconsultthetransactiontabletoundoalluncommittedtransactions.Theundoimplementationdependsonwhetherweuseone-ortwo-layerloggingaswewilldiscussshortly.Aftercompletingrecovery,andunderaforcepolicy,weknowthatalltransactionsarecompleted|eithercommittedoraborted.Thus,wecleartheloginthreesteps:(a)keepthepointertotheloginatemporaryvariable;(b)createanewlog;and(c)de-allocatetheoldlog.De-allocatingtheentirelogisfastercomparedtoindividuallyremovingitsrecords.Two-layerloggingForeachun nishedtransaction,weupdateitsstatusasbeingabortedandscanitslogrecordsbackwardsbyfollowingtheundoNextLogIDpointers:theIDofthenextrecordtoundo;weretrieveeachrecordthroughtheAAVLTandcalltherollbackfunction.Then,wewriteENDrecordsforallabortedtransactions.Intheforcepolicy,toaddressthecornercaseofacrashbetweenthelastCLRandthecorrespondinguserwrite,weredothelastCLR.One-layerloggingThisissimilartoundointwo-layerloggingwithtwomaindi erences:First,selectivelyscanningthelogistooinecientsoweimplementacustomundoprocess(showninAlgorithm2)byundoingalluncommittedtransactionsinasinglebackwardscan.Second,duringthescanwetrackthelastCLR(undo)recordofalltransactionsatanun nishedrollbackstatewiththeaidofanauxiliarydatastructure.WeusethistoskiptheUPDATErecordsthathavealreadybeenabortedsowecan ndthenextrecordtoundowithoutusingtheundoNextLogIDpointer.4.6LogcheckpointingReducingthesizeofthelogisanimportantrequirementofREWINDas:(a)despitetheirgoodscalability,NVMcapacitieswilllikelylagbehindthoseofdisk,and(b)the ne-grainedloggingofREWINDleadstolargermetadatasizes.Keepingthelogsmalliscriticalinone-layerloggingtoreducethecostofscanning.Theremovaloflogrecordsdependsonthecon guration.Whenforcing,weclearthelogrecordsrightafteratransactioncommits/rollbacks.Inano-forcepolicytherecordsareremovedatcheckpoints.Atacheckpoint,thecacheis ushedtomakeallpending Algorithm2:Undooperation(one-layerlogging)in-vokedduringrecovery. 1whileADLLLog.hasPrior()do 2rec=ADLLLog.prior();3xact=transactionTable[rec.xactID];4ifxact.status=RUNNINGorxact.status=ABORTEDthen 5ifxact.status=RUNNINGthen 6ADLLLog.insert(xact.xactID,ROLLBACK); 7ifrec.type=CLRthen 8ifundoMap.[rec.xactID]6=NULLthen 9undoMap[rec.xactID]=rec.undoLogID;10ifforcepolicythenrec.redo(); 11elseifrec.type=UPDATEandrec.isUndoablethen 12ifundoMap[rec.xactID]6=NULL13andundoMap[rec.xactID]rec.logIDthen 14//extraargumentsforCLRrecordomitted15ADLLLog.insert(xact.ID,CLR,...);16rec.undo(); 17//AddENDrecords18whiletransactionTable.hasNext()do 19xact=transactionTable.next();20ifxact.type6=FINISHEDthenADLLLog.insert(xact.ID,END); writespersistent.Regardlessofthemethodused,wehavetoupdatetheloginarecoverableway.Wethusatomicallyremoveeachtransaction'sENDlogrecordasthelastoper-ationtoguaranteethat,afteracrashduringclearing,thenextattemptwillbeperformedinexactlythesameway.Toclearthelogwhenforcing,wescanthelogbackwardsandremovetherecordsofcommittedtransactions.Acheck-pointunderano-forcepolicyismorecomplex.Itisdesignedasa\cache-consistent"checkpointtoallow ne-grainedlock-ing.Thisforcesascanofthelog,butallowsconcurrenttransactionstokeepusingthelog,whichispossibleastrans-actionsonlyappendtothelogwhilecheckpointingremovesrecordsfromthemiddle.WeinsertaCHECKPOINTrecordbeforethecache ushtomarkthepointinthelogthatisper-sistent;allrecordsbeforethatpointcanbesafelyremoved.WedothisbyremovingENDrecordslast.Issuing rstthecache ushandthentheCHECKPOINTrecordcouldleadtonewlyinsertedrecordsappearingtobepersistent.4.7ConcurrencyREWINDallowslow-overhead, ne-grainedconcurrency.Weusesimplelockstoserializelogaccessandensuretraversals(duringacheckpoint)arethread-safe.Theone-layer/no-forcecon gurationo ersthe nest-grainedconcur-rencyduetothesimplelogstructure,whichallowsustolockthelogonlybrie yduringinsertionorremoval.REWINDcouldfurtherbene tfromalock-freeADLLbutthisisleftforfuturework.Thread-safeaccesstouserdatabymultipletransactionsinREWINDisuptotheprogrammer.ThisisduetoREWIND'simperativelanguagenature,whichallowstheprogrammertoarbitrarilyupdatedata.5.PERFORMANCEEVALUATIONWeimplementedREWINDinC++(usingg++4:7:3)toevaluateitsperformance.Weusedaquad-coreIntel®XeonE5420clockedat2:5GHzpercorewith12GBoffullybu eredDDR2memoryrunningtheGNU/Linux3.9kernel.WeemulatedNVMbyaddinglatencythroughabusyloop(seealso[31])precededbyacacheline ushandfollowedbyamemoryfence.Thelatencyemulationisinlinedbe-foreaccessingNVM.Weconsidereverynon-temporalstoreasanindividualNVMwrite,butgroupconsecutivewritestothesamecachelineintoasingleNVMwrite.WesettheNVMwritelatencyto510processorcycles(150ns).Wedo notmodelahigherNVMreadlatencythanDRAMbecause:(a)thetwoarealreadycomparableincurrentNVMtech-nology[25];and(b)transactionprocessingisupdate-heavysowritesa ectperformancethemost.WecompareREWINDtoStasis[27],astate-of-the-artstoragemanagerforpersistentdatastructures.Stasisem-ploysdata-structure-speci cpersistenceandrecoveryop-timizationsasopposedtogeneralDBMS-basedrecoverymechanisms.WealsocomparetothepopularBerkeleyDB.Finally,weincludetheversionofShore-MTfrom[33],whichisheavilymodi edforpersistentmemory.Allapproachesworkoverblockdevices,soaneasywaytoportthemtoNVMwouldhavebeentorunthemonamemory-mounted lesystem(e.g.,RAMFS).Wefollowedadi erentap-proachandusedPMFS[9]:akernel-level lesystemthatismemory-mountedandbyte-addressable.PMFSguaranteespersistencethroughstandard lesystemcalls,butitsim-plementationisoptimizedforbyte-addressability,thusmin-imizingtheoverheadoverNVM.Thus,itdoesnotadverselyimpacttheperformanceofStasis,BerkeleyDBorShore-MT.Wefurtherfavorthetwoformersystemsbyonlycharginglatenciesforuser-datawritestoPMFSandnotforPMFS'sinternalbookkeepingwrites.WealsofavorShore-MTbydisablinganylatenciesusedin[33].WedonotcomparetoapproacheslikeMnemosyne[31]orNV-Heaps[5]sincetheydonotprovidefullloggingandrecoveryfunctionalityandarethuscomplementary(seealsoSection6).WeusedBerkeleyDBversion6:0:20deployedasin[27].Thecacheandlogbu ersizesmatchedthoseofStasis.Thelockmanagerwasdisabledtofurtherimproveperformance.ForShore-MTweusedthetransaction-levelpartitioningvariantwithdurable-cacheenabledandsimilarcon gurationwiththeothertwosystems.WerefertothethreeversionsofREWINDasSimple,OptimizedandBatchandthesecorre-spondtothedoubly-linked-list(Section3.2),hybriddoubly-linked-list(Section3.3)andhybriddoubly-linked-listwithbatchedlogrecords(Section3.3)implementations.Wecon- guredtheOptimizedversionwithabucketsizeof1;000recordsandtheBatchversionwitha64-bytecachelinesizeand8-bytepointersizetomatchourhardware.InSec-tion5.1weuseOptimizedREWINDforallone-layercon- gurationsasthisisthecon gurationusedasthebottomlayerofthetwo-layerapproach.Allresultsaretheaverageofthreerunswithstandarddeviationaverageof1:4%.5.1SensitivityanalysisLoggingoverheadWemeasuretheoverheadofloggingasafunctionofthenumberofmemorystores.Weimplementedamicrobenchmarkwithasingletransactionthatalternatesbetweenupdatinganin-memorytableandperformingsomecomputationbetweenupdates.Thetransactionsuccessfullycommitsattheend.Wecalibratedthecomputationcosttobeamultipleofthecostofanon-loggedstoretoNVM.IntheleftplotofFigure3weshowtheloggingoverheadasafunctionofthefractionoftimespentonupdates;theoverheadisreportedastheratiobetweentheperformanceofREWINDandthenon-recoverableimplementationoverNVMe.g.,aratioof5meansREWINDis5xslower.Wetestedallfourcon gurations:two-layerorone-layerlogging(2Lor1L);andforceorno-forcepolicies(FPorNFP).Therightmostpointoftheplotsrepresentstheworstcase:theuserprogramonlyupdatescriticaldata.Then,theover-headsofthetwo-layercon gurationsarehighercompared Figure3:Loggingoverheadasafunctionofupdateintensity(left)andnumberofskiprecords(right).totheone-layercon gurations.Thelowoverheadsoftheone-layercon gurationsshowthee ectivenessoftheOpti-mizedimplementationofREWIND.Thedi erencebetweentheoverheadsofthetwo-layerandone-layercon gurationsstemfromthecostofusingtheAVLtreeandmaintainingthetransactiontable.Thetotaloverheaddecreasessteeplyastheintensityofupdatesdecreases.Fora10%updateintensity,theoveralloverheaddropstoonly1:5fortheone-layerno-forcecon gurationand8:5forthetwo-layerno-forcecon guration.Thedi erenceinloggingoverheadbetweentheforceandno-forcepoliciesisnotasdramatic,especiallyforone-layerlogging,butitisstillsigni cant.Tobetterconveytheinformationwehavemagni edtheplotfortheone-layerrunsatthebottomofthegraph.Forone-layerloggingtheoverheadvariesbetween2%to35%andfortwo-layerloggingbetween24%to74%.Theincreasedloggingoverheadoftheforcepolicyisduetothemoreexpensivenon-temporalwritestoNVMfortheuserupdatesandfromtheextraworktoclearthelogatcommit(Section4.3).Wenextfocusonthecomparisonofone-andtwo-layerloggingunderaforcepolicy.Recallthatcommitsinone-layerloggingrequirelinearscansofthelog,whichbecomemoreexpensivewithmoreinterleavinglogentries(amea-sureofthenumberofconcurrentlyrunningtransactions).Wetermsuchentriesskiprecords,astheywillneedtobeskippedifthistransactionistobeselectivelyprocessed.Two-layerloggingrecti esthisthroughtheAVLindex.Wechangedourmicrobenchmarktogenerateavariablenum-berofrecordsfromothertransactionsbetweenrecordsofaspeci ctransaction.Alltransactionsupdatethesamein-memorytable,sotheycorrespondtotheworst-case100%update-intensiveworkloadofthepreviousexperiment.Thenumberofskiprecordsvariedfrom100to1;000.Thismightseemlikeasmallnumber,butrecallthatREWINDrunsinuser-modeandinasingleapplicationcontext.Skiprecordscorrespondtothenumberofinterveningconcurrentupdatesofasharedresourceinasinglecontext(performed,perhaps,bymultiplethreads),soasmallernumberofsuchrecordsisenoughtomeasuretheoverheadandsucienttoindicatetheperformancetrendsofeachREWINDcon guration.IntherightplotofFigure3wereporttheoverheadoftheone-andtwo-layercon gurationsasafunctionofthenumberofskiprecords.Theoverheadisagainexpressedastheratioovertheperformanceofthenon-recoverablever-sionofthesamemicrobenchmark.Inone-layerloggingtheoverheadgrowssharplywiththenumberofskiprecords.Intwo-layerlogging,ontheotherhand,theoverheadisrela-tively xed.Inreality,italsogrowswiththenumberofskiprecords,butatsuchaslowratethatitisuntraceableintheplot.Eventhoughone-layerloggingstartso performingbetterthantwo-layerlogging,itsdegradationasthenum-berofskiprecordsgrowsissoseverethatthetwo-layer Figure4:Single-transactionrollback(left)andre-covery(right)foravaryingnumberofskiprecords.con gurationoutperformsitafteraround600skiprecords.ThissuggeststhatinauserapplicationthedecisionofwhichREWINDcon gurationtoemployisnotaclearoneastherewillbeacrossoverpointbeyondwhichthetwo-layercon g-urationstartsshowingitsmerits.Itisuptotheusertodecideiftheconcurrencyneedsoftheapplicationarehighenoughfortwo-layerloggingtobethebestchoice.RollbackandrecoverycostsOurpurposeistoassesstheimpactofthenumberoflogginglayersontheperfor-manceofsingletransactionrollback.Weusethesamemi-crobenchmarkasbefore,butinsteadofcommittingthetar-getedtransactionwerollitback.IntheleftplotofFigure4weshowtherollbackduration(inmilliseconds)asafunc-tionofthenumberofskiprecordsfortheone-andtwo-layercon gurationsandforaforcepolicy.Therollbacktimeoftheone-layercon gurationgrowsfasterthanthatofthetwo-layercon gurationasweincreasethenumberofskiprecords.Thetwo-layercon gurationcatchesupwiththeone-layeroneataround400skiprecords.Aswasthecaseforcommit,thissuggeststhatthetwo-layercon gurationexhibitsitsmeritsafterasucientnumberofskiprecords.Again,theprogrammershouldcustomizetheREWINDcon- gurationfortheexpectedapplicationworkload.REWINDitselfcanbetunedtoadapttovariousworkloads.Nextwereportthecostofabortingasingleuncommittedtransactionduringrecovery,insteadofrollingitbackdur-ingnormaloperation,againasafunctionofthenumberofskiprecords.Thiscaseappearswhenatransactionstartsitscommitprotocol,butdoesnot nishcommitting(i.e.,itdoesnotloganENDrecord);itmustthenbeabortedduringrecovery.Thiscontinuestheanalysisofthechoicebetweenone-ortwo-layercon gurations,butinamorecontrivedsce-nario.Weextendedthemicrobenchmarktocommitallothertransactionsbutthetargetone,butwithoutclearingthemfromthelogsotheirentrieshavetobeskippedwhenrecov-eringthetarget.ThatcouldhappenifthesystemcrashedafterthesetransactionsloggedtheirENDlogged(sothesys-temwillnottrytoabortthem)butbeforeclearingthelog.IntherightplotofFigure4wereporttherecoverytimeasafunctionofthenumberofskiprecordsfortheone-andtwo-layercon gurationsandwithaforcepolicy.One-layerloggingnowsigni cantlyoutperformsthetwo-layercon g-uration.Althoughtwo-layerloggingperformsbetterduringtheundophase,andforselectivetransactionrollback,itisswampedbythesloweriterationoverthelogcontentsduringtheanalysis/redophasesthusgreatlyexacerbatingtherecoverytime.Thiscontraststheearlierresultswhereone-layerloggingwasoutperformedbytwo-layerloggingandreinforcestheintricaciesofchoosingacon guration.Wenowreportthetotalprocessingcost(loggingpluscom-mitorrecovery)asafunctionofthelikelihoodthattrans-actionsarerecovered.Weextendedthemicrobenchmarkto Figure5:Loggingandrecoverycostasafunctionofthefractionofrecoveredtransactions. Figure6:Impactofcheckpointingfrequency.selectavaryingnumberoftransactionstoberecoveredandtimedboththeloggingandthecommitorrecoveryprocessofalltransactions.InFigure5weshowthetotaltimeasafunctionofthefractionoftransactionsthatneedtoberecovered,fortheone-layercon gurationwithbothforceandno-forcepoliciesandwiththreevaluesofskiprecords:10,150and300.Forbothpolicieswefactoroutthedura-tionoflogclearingtocomparethemethodsirrespectivelyofwhetherclearingisimmediateorthroughcheckpoints.Wedonotconsiderthetwo-layercon gurationaswehaveal-readyseenitperformworsethantheone-layeroneintermsofbothrecoveryandlogging.Theexecutiontimeissen-sitivetothenumberofskiprecordsgiventhedependencybetweenrollback/recoverycostandthevalueofthisparam-eter,aswasshownearlier.Recallthattheno-forcepolicyrequirestwophasesduringrecovery,whereastheforcepolicyrequiresthree.Observethenthattheno-forcepolicyhasaslightadvantageforthesamenumberofskiprecordsandaverylowcrashprobability.Itiseventuallyoutperformedbytheforcepolicybecauseoftheextrarecoveryphase.Thisismoreevidentasthenumberofskiprecordsincreasesbecausethedurationoftheextraphaseincreasesaswell.CheckpointoverheadTomeasurethecheckpointover-headweinsertedtenmillionlogrecordsinthethreeREWINDversions,con guredwithone-layerloggingandano-forcepolicy.Werantheinsertionsforeachcon g-urationwithandwithoutcheckpointsandwereporttheoverheadofthecheckpointedrunasthepercentageofnon-checkpointedexecutionforavaryingcheckpointfrequency.Overall,theoverheaddeclineswithdecreasingcheckpointfrequency.However,theoverheadintheSimpleversionismoreseverecomparedtotheothertwoversions.Thisisduetothecoarserdegreeofconcurrency:theSimpleapproachneedstolockandserializetheinsertionofanewrecordtotheADLLwhiletheothermethodsonlyapplyasingleup-datetoabucket.AsshowninFigure6,theoverheadsoftheSimple,Optimized,andBatchREWINDversionsvaryfrom79%to60%,32%to9%,and20%to3%respectively.5.2ComplextransactionalworkloadsLoggingWeevaluatetheoverheadofREWINDwhenre-coveringdatastoredinaB+-tree.Wetestedeightcon- Figure7:B+-treeloggingperformanceforREWINDvs.norecoverability(left);REWINDvs.Stasis,BerkeleyDBandShore-MT(right). gurations:DRAM,withoutpersistenceorrecoverability;NVMwithpersistencebutwithoutrecoverability;thethreeREWINDversionsrunningonNVM;andStasis,Berke-leyDBandShore-MTrunningonNVM.Thelastsixcon- gurationsguaranteepersistenceandarerecoverable.AllREWINDversionswerecon guredwithano-forcepol-icyandwithoutcheckpoints.Weimplementedonein-memoryB+-treeversionforeachdi erentpersistencelayer:REWIND,Stasis,BerkeleyDB,Shore-MT.WeloadedtheB+-treewith100k32-byte-longrecordsandperformedamixof200klookupsandupdatesaswevariedtheread/updateratio.Theupdateswereequallydividedbetweeninsertionsanddeletionsforaconstanttreesizeperread/updateratio.IntheleftplotofFigure7weshowthetotalexecutiontimefortheworkloadasafunctionofthefractionofupdatequeries.TheoverheadoftheDRAMandNVMimplemen-tationsgrowswiththefractionofupdates,albeitgently,asupdatesaremoreexpensivethanlookups.Thisisexagger-atedintheNVMimplementationbecauseoftheoverheadofNVMwrites.AllREWINDcon gurationsperformwellandclosetotheDRAMandNVMimplementations.TheOptimizedversionimprovestheSimpleversionby27%andtheBatchversionimprovesitby37%.WethereforefocusontheREWINDBatchvariantfromnowon.IntherightplotofFigure7wecomparetheoverheadofREWINDtoStasis,BerkeleyDB,andShore-MT.REWINDoutperformsStasisby85,BerkeleyDBby105andShore-MTby205at100%updatequeries.ThisisduetoREWIND'smin-imalisticdesign,leanersoftwarestack,andNVM-speci coptimizations.Shore-MTisoutperformedasitisoptimizedformulti-threadedperformancewhiletheworkloadissingle-threaded.InSection5.2weshowhowShore-MTscalesbet-terthanBerkeleyDBandStasisinmulti-threadedmode.RollbackandrecoveryWereportthecostoftransac-tionrollbackasafunctionofthenumberofoperations.Westartedwitha100k-recordB+-treeandtheninvokedamixedworkloadofanequalnumberofrandomlydistributedinsertionsanddeletions.ThiskeepsthesizeoftheB+-treesmall,butgeneratesalargenumberoflogrecords.Were-porttheresultsintheleftplotofFigure8.REWINDBatchoutperformsStasisby30,BerkeleyDBby12andShore-MTby4.ThisisduetotheREWINDalgorithmsanditsminimalphysical ne-grainedlogging,asopposedtothelog-icalloggingofStasis,orthecoarse-grained,page-levellog-gingofBerkeleyDBandShore-MT.Shore-MT'sexcellentperformanceisduetoundobu erskeepingtheundologrecordsinmemory.IntherightplotofFigure8wereportthecostoffullrecoveryformultipletransactions.Weusedthesamesetupbutnowwecreatedanewtransactionev-ery200operations.Thus,thenumberoftransactionsvaries Figure8:B+-treerecoveryforsingle(left);andmul-tipletransactions(right).from400to4;000.REWINDoutperformsStasisby20,BerkeleyDBby14andShore-MTby8.Thisisduetothelowerper-transactionoverheadofREWINDandone-layerloggingdoingawaywiththetransactiontable.CoupledwiththeecientNVM-speci cimplementation,theresultisalargeperformancemarginoverthecompetition. Figure9:MultithreadedB+-treeloggingperformance.ConcurrencyTotestREWIND's ne-grainedconcurrency,westartedmultiplethreadswitheachthreadperforming100koperationsonaB+-tree.Eachoperationiseitheraninsert/deletepairoralookup.Thelookup-to-insert/deleteratiorangesfrom20%to80%(e.g.,30%lookups,70%insert/delete).Eachthreadisassignedaratioatthebeginningandpicksupoperationsfromapoolofavailabletasks.Wemeasuredthetotaldurationoftherun,i.e.,untilallthreads nished,asafunctionofthenumberofthreads.REWINDusesitsownlibrary-levelconcurrencymechanisms.ForStasisandBerkeleyDBweletreadersprogresswithoutlocksbutuselocksforinsert/deletepairs.ThisimprovesperformanceforBerkeleyDBasiteliminatesdeadlocks.ForShore-MTweuseitsownconcurrencymechanismsforuptofourthreadsasitcreatesonelogpartitionforeachcore.BeyondthatwefounditbettertousethesamelockingasStasisandBerkeleyDB.AsshowninFigure9,theprocessingtimesofStasisandBerkeleyDBgrowlinearlywiththenumberofthreads.Shore-MTasexpectedscalesbetterthanStasisandBerkeleyDBuntilthe rstfourthreadsandthenyieldssimilarperformancewithBerkeleyDB.REWINDscalessigni cantlybetterafterthreethreads.TheprocessingtimeforREWINDdoesnotincreasemonotonically.ThisisduetotheOSschedulingthreadstothesamecore.Althoughwesettheanityofeachtasktoadi erentcore,thelightweightlockingofREWINDresultsinthreads nishingsofastthattheOSseemstoignorethathintandschedulesthreadswithdi erentanitiestothesamecore.MemoryFencesensitivityMemoryfencelatencyvariesdependingonthestoragearchitecture.Weshowhowwecanmitigateitsimpactbygroupinglogrecords.WerepeatthebenchmarkofFigure7withthefractionofupdatequeriessetto1(theworstcasescenario).WecompareREWINDOptimized,whichsupportsin-placeupdatessolutionbutnogrouping,withREWINDBatchforvaryinggroupsizes,e.g.,REWINDBatch8uses8recordspermemoryfence;wealsoincludevariationsof16and32recordspergroup. Figure10:MemoryFencesen-sitivity.OurresultsareshowninFigure10.REWINDOptimizedisa ectedandissloweddownby5whileREWINDBatchhasaslow-downof1:63,1:32,1:18forgroupsizesof8,16,and32respectively.Wecanthereforemitigatethefencecostofdi erentstoragearchitecturesbytuningthegroupsize.WealsotestedStasis,adisk-replacementsolution,whichremaineduna ectedasexpected.Theseresultsareinlinewithpreviouswork[24].InREWINDtheoptimizationsofSection3.3aretwofoldastheymitigatethecostofthefenceandalsoreducethewriteoverhead.DuetothelackofpagesREWINDdoesnotneedtorestrictthetransactionsofthegroupi.e.,forcealltransactionsinthegrouptocommitorabort.5.3TPC­CWeuseavariantoftheTPC-Cbenchmarkto:(a)stress-testREWIND;and(b)showthatbycollapsingthebound-ariesbetweenthein-memoryandthepersistentrepresenta-tionswecanimproveperformancebyco-designingtheal-gorithmsandthephysicaldatalayout.WeimplementtheTPC-CschemawithB+-treesfortablestorageandfocusonthenewordertransaction.Weuseascalingfactorofoneandusetenthreadsonourtestmachinetosimulatethetenterminalsissuingnewordertransactions,whichisaslightdeviationfromTPC-Cwhereaterminalcanchooseamong vetypesoftransactions.However,ourgoalistomeasuretheoverheadinwrite-intensiveoperationsandnottestthefeaturesofafull-blownDBMS.Thus,thenewordertransactionisthebestchoiceasitisthemostwrite-intensiveTPC-Ctransactionandthebackboneoftheentireworkload.Weusefourdatalayouts:standardpersistentbutnotre-coverableB+-treesinNVM;naiveB+-treesoverREWIND;anoptimizedlayoutofB+-treesoverREWINDtorepresentcompoundkeys;andthelatterwithadistributedlog[24].ForREWINDwithoptimizedB+-trees,weuseanarrayofB+-treestorepresentatablewithacompoundkey.Fortheordertables(orders,order line,andnew order),insteadofhavingaB+-treewithacompoundkeyon(warehouse id,district id,order id)pertable,wenotedthatthedo-mainsofwarehouseanddistrictconsistedofoneandtenvaluesrespectivelyastherearetendistrictsinasingleware-house.Thus,webuildanarrayoftenB+-trees,eachonorder id.InREWIND,theuseofdistributedloggingisuptotheuser.Usingasingletransactionmanagerforalltrans-actionsdictatesasharedlog;whileaper-transactionman-agerimpliesadistributedlog.This exibilityfurtherenablesco-design:throughthepersistenceandrecoveryguaranteesofREWIND,programmerscanoptimizethedatastructuresandtheimplementationoftransactions.AspertheTPC-Cspeci cations,weabort1%oftrans-actions.InREWINDthesetransactionsarerolledbackwhileinthestandardNVMversiontheyareconsiderednon-recoverableandignored:thisaddsasigni cantoverheadtotheREWINDB+-tree.WedonotcomparetoothersystemsasREWINDsigni cantlyoutperformedthemearlier. Figure11:TPC-Cthroughput.InFigure11,thenon-recoverableim-plementationwithnaivedatastructureshasathroughputof273ktransactionsperminute(tpm).Theoptimizedim-plementationoverREWINDyieldsathroughputof197ktpmfora1:39overhead.Thishighlightsthepotentialofco-design:REWINDenablesprogram-levelworkload-speci coptimizationsaspersis-tenceandrecoverabilityneednotbeooadedtoadi erentruntime.Distributedloggingimprovesthethroughoutevenmoreto262ktpmanda1:05overhead.REWINDwithnaiveimplementationofthedatastructuresgivesathroughputof37ktpmforaslowdownof7:37overthenon-recoverableNVMversion.ThisperformanceisinlinewiththemicrobenchmarkresultsofSection5.1andtheresultsof[24]fordistributedlogging.6.RELATEDWORKPersistentvirtualmemory[26,34]hasreceivedrenewedinterestthroughpersistentregions[14].Suchattemptsem-ployblock-levelI/Odevicesand leabstractions.Recov-erabilityreliesonstagingpersistenceandloggingthroughcombiningvolatilemainmemoryandpersistentdiskstor-age.Closertoourwork,[19]usesbattery-backedDRAMforpersistingthe lecache[3],butultimatelyreliesonI/Oandusesacoarse-grainedregionapproachtoundologging.Tworecentproposals[5,31]provideNVMheapstoap-plications.Weleveragethistosupportin-memorypersis-tentdatastructures.Both[5,31]onlyprovideprimitivesforprogrammerstocreateandmanagetheirownrecoveryprotocols.Fangetal.[10]proposeanNVM-basedlogman-agerforDBMSs,which,unlikeourapproach,reliesonaclient-serverdesignandusesepochbarrierstoguaranteepersistence.Gilesetal.[12]addressembeddedtransactionmanagementinusercode,butunlikeourworktheyrequirecustomhardwaretoforcetheredologtoNVMbeforecom-mitting,whilekeepinguserupdatesinadedicatedbu erbeforepersistingthelog;theydonotelaborateonrecov-erymechanisms.Morerecentwork[13]hassimilargoalstoREWIND,butdoesnotgoasfarinaddressingrecoveryandconcurrency:itonlyperformsredologgingwithoutin-placeupdates.Similarly,[35]embedstransactionmanagementinusercodebutitassumestheexistenceofanon-volatilecachethatitusesinsteadoflogging.Finally,[2]studiestheup-datesemanticsofNVMdatainlock-basedcode(asopposedtotransactionalcode)andtouchesonlysuper ciallyonthemechanismsusedforloggingandrecoveryinNVM.PriortoNVM,researchersproposedbattery-backedDRAMandbu ermanagerextensionstosupportrecover-ability[21].Forinstance,[6]usesbattery-backedDRAMwithanARIES-likeprotocol,but,unlikeus,itstillassumespage-levelI/Ofordataandlogupdates.DBMSsoptimizedforvolatilememory[8,17]arealsorelevant.Thesesignif-icantlyimprovedisk-basedalternativesbutarestillsubop-timalforNVMastheyaresubjecttotheinecienciesofablock-baseddesigntowardsdurability.Pelleyetal.[24],proposedistributedloggingandgroupcommitsformitigatingthememoryfencelatencyinNVM. TheseareorthogonaltoREWINDandweexaminetheiref-fectsinSections5.2and5.3,respectively.Similarly,[33]examineshowNVMallowspracticaldistributedlogging.Unlikeourapproach,[33]targetspage-leveldataandlogupdates.WecomparedthistoREWINDinSection5.2.Recentworkhasconsideredmorelightweightdataman-agementthanfull-blownDBMSs.Forinstance,[27]com-paresbothDBMSsand lesystemstocustomalternatives;while[15]quanti estheoverheadofseveralDBMSfunc-tionalities.Therehasalsobeenrecentinterestinsimilarlyextending lesystems.Forexample,[23]presentsanex-tendedtransactionalandrecoverableI/Ointerfaceformul-tiple,non-consecutiveblocks.Thesestillpresentablockinterfacetoprogrammersunlikeourbyte-addressableap-proach.Finally,therehasalsobeenworkonqueryprocess-ingalgorithmse.g.,[4,30]forNVM.Thesedi erfromourapproachastheyassumeacompleteDBMSinsteadofourprogrammer-manageddatastructures.7.CONCLUSIONSANDFUTUREWORKNewNVMtechnologiesallowprogrammerstomaintainasinglecopyoftheirpersistentdatastructuresinmainmem-oryandaccessthemdirectlywithCPUloadsandstores.ThisrenderstransactionalrecoverymechanismsbasedonblockI/Oandtheseparationofvolatileandnon-volatiledatainappropriate.WepresentedREWIND,auser-modelibrarythatdirectlymanagespersistentdatastructuresinNVMinarecoverableway.ThelibraryprovidesasimpleAPIandtransparentlyhandlesrecoveryofcriticaldata.OurresultsshowthatREWINDoutperformsI/O-basedsolu-tionsataminimaloverhead,therebyprovidingapromisingpathtowardenablingpersistentin-memorydatastructures.Asthisisafreshresearcharea,thereismoreworktobedone.OuroverarchinggoalistoembedREWINDintoacompilerframeworkalasoftwaretransactionalmemory.Furtherperformancebene tswilllikelycomeifweimple-mentthebasiclogstructureusinglock-freetechniques.An-othergoalistointroduceautotuningsothatthesystemadaptstotheworkloadthroughmonitoring.AcknowledgmentsTheauthorswouldliketothanktheanonymousreviewersfortheircommentsandtheauthorsof[33]fortheirimplementationofShore-MT.ThisworkwassupportedbytheIntelUniversityResearchOce.8.REFERENCES[1]AppleDeveloperLibrary.CoreDataProgrammingGuide,2014.[2]D.ChakrabartiandH.-J.Boehm.Durabilitysemanticsforlock-basedmultithreadedprograms.InHOTPAR,2013.[3]P.M.Chenetal.TheRio lecache:Survivingoperatingsystemcrashes.InASPLOS,1996.[4]S.Chenetal.Rethinkingdatabasealgorithmsforphasechangememory.InCIDR,2011.[5]J.Coburnetal.NV-heaps:Makingpersistentobjectsfastandsafewithnext-generation,non-volatilememories.InASPLOS,2011.[6]G.Copelandetal.ThecaseforsafeRAM.InVLDB,1989.[7]B.Cornelletal.Wayback:Auser-levelversioning lesystemforLinux.InATC,2004.[8]C.Diaconuetal.Hekaton:SQLserver'smemory-optimizedOLTPengine.InSIGMOD,2013.[9]S.R.Dullooretal.Systemsoftwareforpersistentmemory.InEuroSys,2014.[10]R.Fangetal.Highperformancedatabaseloggingusingstorageclassmemory.InICDE,2011.[11]P.Felberetal.Transactifyingapplicationsusinganopencompilerframework.InTRANSACT,2007.[12]E.Gilesetal.BridgingtheprogramminggapbetweenpersistentandvolatilememoryusingWrAP.InCF,2013.[13]E.Gilesetal.Softwaresupportforatomicityandpersistenceinnon-volatilememory.InMEAOW,2013.[14]J.Guerraetal.Softwarepersistentmemory.InATC,2012.[15]S.Harizopoulosetal.OLTPthroughthelookingglass,andwhatwefoundthere.InSIGMOD,2008.[16]D.R.Hippetal.SQLiteDatabase,2014.[17]R.Kallmanetal.H-Store:ahigh-performance,distributedmainmemorytransactionprocessingsystem.PVLDB,1(2),2008.[18]LinuxKernel.LinuxProgrammer'sManual,2014.[19]D.E.LowellandP.M.Chen.FreetransactionswithRiovista.InSOSP,1997.[20]C.Mohanetal.ARIES:Atransactionrecoverymethodsupporting ne-granularitylockingandpartialrollbacksusingwrite-aheadlogging.ACMTODS,17(1),1992.[21]W.T.NgandP.M.Chen.Integratingreliablememoryindatabases.InVLDB,1997.[22]OracleCorporation.OracleBerkeleyDB11g,2014.[23]X.Ouyangetal.BeyondblockI/O:Rethinkingtraditionalstorageprimitives.InHPCA,2011.[24]S.Pelleyetal.StoragemanagementintheNVRAMera.PVLDB,7(2),2014.[25]M.K.Qureshietal.PhaseChangeMemory:fromdevicestosystems.Morgan&Claypool,2012.[26]M.Satyanarayananetal.Lightweightrecoverablevirtualmemory.InSOSP,1993.[27]R.SearsandE.Brewer.Stasis:Flexibletransactionalstorage.InOSDI,2006.[28]R.P.Spillaneetal.Enablingtransactional leaccessvialightweightkernelextensions.InFAST,2009.[29]S.Venkataramanetal.Consistentanddurabledatastructuresfornon-volatilebyte-addressablememory.InFAST,2011.[30]S.D.Viglas.Write-limitedsortsandjoinsforpersistentmemory.PVLDB,7(5),2014.[31]H.Volosetal.Mnemosyne:Lightweightpersistentmemory.InASPLOS,2011.[32]C.Wangetal.Codegenerationandoptimizationfortransactionalmemoryconstructsinanunmanagedlanguage.InCGO,2007.[33]T.WangandR.Johnson.Scalableloggingthroughemergingnon-volatilememory.PVLDB,7(10),2014.[34]M.WuandW.Zwaenepoel.eNVy:Anon-volatile,mainmemorystoragesystem.InASPLOS,1994.[35]J.Zhaoetal.Kiln:Closingtheperformancegapbetweensystemswithandwithoutpersistencesupport.MICRO,2013.