/
Operating System Transactions Donald E Operating System Transactions Donald E

Operating System Transactions Donald E - PDF document

test
test . @test
Follow
478 views
Uploaded On 2014-12-22

Operating System Transactions Donald E - PPT Presentation

Porter Owen S Hofmann Christopher J Rossbach Alexander Benn and Emmett Witchel Department of Computer Sciences The University of Texas at Austin porterdeoshrossbachabenn1witchelcsutexasedu ABSTRACT Applications must be able to synchronize ID: 27799

Porter Owen Hofmann

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Operating System Transactions Donald E" is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

corruptthelesbysimplyeditingthemdirectly.Thetoolsalsousethesync()andrenamecommandstoensurethatanindividualleisnotcorruptedifthesystemcrashes,butcannotensurethatanupdatetomultiplelesisconsis-tentlypropagated.Forinstance,supposeasystemcrashesafteruseraddwrites/etc/passwdbutbeforeitwrites/etc/shadow.Afterrebootingthesystem,thenewuserwillnotbeabletologon,yetuseraddwillfailbecauseitthinkstheuseralreadyexists,leavingthesystemadministra-tortomanuallyrepairthedatabaseles.Theproliferationoftoolstomitigatesuchasimpleproblem,aswellasthetools'incompleteness,indicatethatdevelopersneedabetterAPIforconsistentsystemaccesses.Inpractice,OSmaintainersaddressthelackofconcur-rencycontrolinthesystemcallAPIinanadhocmanner:newsystemcallsandcomplexinterfacesareaddedtosolvenewproblemsastheyarise.Thecriticalproblemofelimi-natinglesystemraceconditionshasmotivatedSolarisandLinuxdeveloperstoaddoveradozennewsystemcalls,suchasopenat,overthelastsevenyears.Linuxmaintainersaddedaclose-on-execagtofteensystemcallsinarecentversionofLinux[13]toeliminatearaceconditionbetweencallstoopenandfcntl.Individuallesystemshavein-troducednewoperationstoaddressconsistencyneeds:theGoogleFileSystemsupportsatomicappendoperations[16],whileWindowsrecentlyadoptedsupportfortransactionsinNTFSandtheWindowsregistry[44].UsersshouldnotberequiredtolobbyOSdevelopersfornewsystemcallsandlesystemfeaturestomeettheirconcurrentprogrammingneeds.Whynotallowuserstosolvetheirownproblemsbysupportingcompositionofmultiplesystemcallsintoarbi-traryatomicandisolatedunits?Thispaperproposessystemtransactionstoallowprogram-merstogroupaccessestosystemresourcesintologicalunits,whichexecutewithatomicity,consistency,isolation,anddura-bility(ACID).Systemtransactionsareeasytouse:codere-gionswithconsistencyconstraintsareenclosedwithinthesystemcalls,sys_xbegin()andsys_xend().Theusercanabortanin-progresstransactionwithsys_xabort().Placingsystemcallswithinatransactionalterstheseman-ticsofwhenandhowtheirresultsarepublishedtotherestofthesystem.Outsideofatransaction,actionsonsystemresourcesarevisibleassoonastherelevantinternalkernellocksarereleased.Withinatransaction,allaccessesarekeptisolateduntilcommittime,whentheyareatomicallypub-lishedtotherestofthesystem.Systemtransactionsprovideasimpleandpowerfulwayforapplicationstoexpresscon-sistencyrequirementsforconcurrentoperationstotheOS.Thispaperdescribesanimplementationofsystemtrans-actionsonLinuxcalledTxOS,whichprovidestransactionalsemanticsforOSresources,includingthelesystem,mem-orymanagement,signals,andprocesscreation.Toefcientlyprovidestrongguarantees,theTxOSimplementationrede-signsseveralkeyOSdatastructuresandinternalsubsysteminterfaces.BymakingtransactionsacoreOSabstraction,TxOSenablesuserandOSdeveloperstocreatepowerfulapplicationsandservices.Forexample,givenaninitialim-plementationofTxOS,asingledeveloperneededlessthanamonthtoprototypeatransactionalext3lesystem.Thispapermakestwoprimarycontributions.First,itde-scribesanewapproachtoOSimplementationthatsupportsefcienttransactionsoncommodityhardwarewithstrongatomicityandisolationguarantees.Secondly,itdemonstratesaprototypeimplementationofsystemtransactions(TxOS)whosestrongguaranteesandgoodperformanceenablenewsolutionstosystemsproblemssuchas:1.Eliminatingsecurityvulnerabilitiesexploitedbylesystemraceconditions.2.Rollingbackanunsuccessfulsoftwareinstallorup-gradewithoutdisturbingconcurrent,unrelatedupdates.Atransactionaldpkginstalladdsonly10%overheadforthisincreaseinsafety.3.Providingalightweightalternativetoadatabaseforconcurrencymanagementandcrashconsistency,yield-ingsimplerapplicationcodeandsystemadministra-tion.ReplacingBerkeleyDBwithatlesandsystemtransactionsasthestorageback-endfortheOpenL-DAPdirectoryserviceimprovesperformanceonwrite-mostlyworkloadsby2–4.4.Allowinguser-leveltransactionalprogramstomakesys-temcallsduringatransaction.Theremainderofthepaperisstructuredasfollows.Sec-tion2providesmotivatinguse-casesforsystemtransactionsandSection3describesprogrammingwithsystemtransac-tionsandtheirimplementationinTxOS.Section4describesthedesignofTxOS,Section5provideskernelimplementa-tiondetails,andSection6describeshowcertainkeysubsys-temsprovidetransactionalsemantics.Section7measurestheperformanceoverheadofsystemtransactionsandevalu-atesTxOSinanumberofapplicationcasestudies.Section8positionsTxOSinrelatedworkandSection9concludes.2.MOTIVATINGEXAMPLESArangeofseeminglyunrelatedapplicationproblemssharearootcause—thelackofageneralmechanismtoensureconsistentaccesstosystemresources.Thissectionreviewstwocommonapplicationconsistencyproblemsandhowsys-temtransactionsremedythoseproblems.Systemtransac-tionsallowsoftwareinstallationstorecoverfromfailureswithoutdisruptingconcurrent,independentupdatestothelesystem.Systemtransactionsalsoeliminateracecondi-tionsinherentinthelesystemAPI,whichcanbeexploitedtounderminesecurity.2.1SoftwareinstallationorupgradeInstallingnewsoftwareorsoftwarepatchesisanincreas-inglycommonsystemactivityastimetomarketpressuresandgoodnetworkconnectivitycombinetomakesoftwareupdatesfrequentforusers.Yetsoftwareupgraderemainsadangerousactivity.Forexample,Microsoftrecalledapre- tics,andbehaviorofsystemtransactions,followedbyanoverviewofhowsystemtransactionsaresupportedinTxOS,ourprototypeimplementationofsystemtransactionswithinLinux.3.1SystemtransactionsSystemtransactionsprovideACIDsemanticsforupdatestoOSresources,suchasles,pipes,andsignals.Inthispro-grammingmodel,bothtransactionalandnon-transactionalsystemcallsmayaccessthesamesystemstate;theOSisresponsibleforensuringthattheseaccessesarecorrectlyse-rializedandcontentionisarbitratedfairly.Theinterfaceforsystemtransactionsisintuitiveandsimple,allowingapro-grammertowrapablockofunmodiedcodeinatransactionsimplybyaddingsys_xbegin()andsys_xend().3.1.1SystemtransactionsemanticsSystemtransactionsshareseveralpropertiesdevelopersarelikelyfamiliarwithfromdatabasetransactions.Systemtransactionsareserializableandrecoverable.Readsareonlyallowedtocommitteddataandarerepeatable,whichcorre-spondstothehighestdatabaseisolationlevel(level3[18]).Transactionsareatomic(thesystemcanalwaysrollbacktoapre-transactionstate)anddurable(transactionresults,oncecommitted,survivesystemcrashes).Toensureisolation,thekernelenforcestheinvariantthatakernelobjectmayonlyhaveonewriteratatime,except-ingcontainers,whichallowmultiplewriterstodisjointen-tries.Twoconcurrentsystemtransactionscannotbothsuc-cessfullycommitiftheyaccessthesamekernelobjectsandatleastoneoftheaccessesisawrite.Suchtransactionsaresaidtoconictandthesystemwilldetecttheconictandabortoneofthetransactions.Non-transactionalupdatestoobjectsreadorwrittenbyanactivesystemtransactionarealsopreventedbythesystem.Eitherthesystemsuspendsthenon-transactionalworkbeforetheupdate,oritabortsthetransaction.Bypreventingconictingaccessestothesamekernelobject,thesystemprovidesconictserializabil-ity,whichiscommonlyusedtoenforceserializabilityef-ciently.Systemtransactionsmakedurabilityoptionalbecausedura-bilityoftenincreasestransactioncommitlatencyandthepro-grammerdoesnotalwaysneedit.Theincreasedcommitlatencycomesfromushingdatatoaslowblockstoragede-vice,likeadisk.EliminatingtheTOCTTOUraceinthelesystemnamespaceisanexampleofasystemtransactionthatdoesnotrequiredurability.Durabilityforsystemtransac-tionsinTxOSisunderthecontroloftheprogrammer,usingaagtosys_xbegin()(Table2).Eachkernelthreadmayexecuteasystemtransaction.Trans-actionalupdatesareisolatedfromallotherkernelthreads,includingthreadsindifferentprocesses.Wecallakernelthreadexecutingasystemtransactionatransactionalkernelthread.3.1.2Interactionoftransactionalandnon-transactionalthreadsTheOSserializessystemtransactionsandnon-transac-tionalsystemcalls,providingthestrongestguaranteesandmostintuitivesemantics[18]totheprogrammer.Theserial-izationoftransactionalandnon-transactionalupdatestothesameresourcesiscalledstrongisolation[5].PreviousOStransactiondesignshavelefttheinteractionoftransactionswithnon-transactionalactivitysemanticallymurky.Intuitivesemanticsformixingtransactionalandnon-transactionalac-cesstothesameresourcesiscrucialtomaintainingasimpleinterfacetosystemresources.Strongisolationpreventsun-expectedbehaviorduetonon-transactionalandtransactionalapplicationsaccessingthesamesystemresources.Thepresenceofsystemtransactionsdoesnotchangethebehaviorofnon-transactionalactivityintheunderlyingop-eratingsystem.Whilemostsystemcallsarealreadyiso-latedandatomic,thereareimportantexceptions.Forexam-ple,Linuxdoesnotserializereadwithwrite.OnanOSwithsystemtransactions,non-transactionalsystemcallscanstillexhibitnon-serializablebehaviorwithrespecttoeachother,butnon-transactionalsystemcallsserializewithtrans-actions.Forexample,oneormorecallstoreadinasystemtransactionwillcorrectlyserializewithanon-transactionalwrite.3.1.3SystemtransactionprogressTheoperatingsystemguaranteesthatsystemtransactionsdonotlivelockwithothersystemtransactions.Whentwotransactions,AandB,cannotbothcommit,thesystemse-lectsonetorestart(let'ssayBinthisexample),andensuresitsdecisionremainsconsistent.IfAcontinuesandBrestartsandagainconictswithA,theOSwillagainrestartB.See§5.2.1fordetails.Guaranteeingprogressfortransactionalthreadsinthepres-enceofnon-transactionalthreadsrequiressupportfromtheOS.IfanOSsupportspreemptionofkernelthreads(presentinLinux2.4and2.6since2004),thenitcanguaranteeprogressforlongrunningtransactionsbypreemptingnon-transactionalthreadsthatwouldimpedeprogressofthetransaction.TheOShasseveralmechanismstoregulatetheprogressoftransactions,buttheuseofthesemechanismsisamatterofpolicy.Forinstance,allowingalongrunningtransactiontoisolateallsystemresourcesindenitelyisundesirable,sotheOSmaywantapolicythatlimitsthesizeofatransaction.Limitingatransactionthatover-consumessystemresourcesisanalogoustocontrollinganyprocessthatabusessystemresources,suchasmemory,diskspace,orkernelthreads.3.1.4SystemtransactionsforsystemstateAlthoughsystemtransactionsprovideACIDsemanticsforsystemstate,theydonotprovidethesesemanticsforappli-cationstate.SystemstateincludesOSdatastructuresanddevicestatestoredintheoperatingsystem'saddressspace,whereasapplicationstateincludesonlythedatastructures FunctionName Description intsys_xbegin(intags) Beginatransaction.Theagsspecifytrans-actionalbehavior,includingautomaticallyrestartingthetransactionafteranabort,ensur-ingthatcommittedresultsareonstablestor-age(durable),andabortingifanunsupportedsystemcallisissued.Returnsstatuscode. intsys_xend() Endoftransaction.Returnswhethercommitsucceeded. voidsys_xabort(intno_restart) Abortsatransaction.Ifthetransactionwasstartedwithrestart,settingno_restartover-ridesthataganddoesnotrestartthetrans-action. Table2:TxOSAPI4.TXOSDESIGNSystemtransactionsguaranteestrongisolationfortrans-actions,whileretaininggoodperformanceandsimpleinter-faces.ThissectionoutlineshowtheTxOSdesignachievesthesegoals.4.1InteroperabilityandfairnessTxOSallowsexibleinteractionbetweentransactionalandnon-transactionkernelthreads.TxOSefcientlyprovidesstrongisolationinsidethekernelbyrequiringallsystemcallstofollowthesamelockingdiscipline,andbyrequiringthattransactionsannotateaccessedkernelobjects.Whenathread,transactionalornon-transactional,accessesakernelobjectforthersttime,itmustcheckforaconictingan-notation.Theschedulerarbitratesconictswhentheyaredetected.Inmanycases,thischeckisperformedatthesametimeasathreadacquiresalockfortheobject.Interoperabilityisaweakspotforprevioustransactionalsystems.Inmosttransactionalsystems,aconictbetweenatransactionandanon-transactionalthread(calledanasym-metricconict[41])mustberesolvedbyabortingthetrans-action.Thisapproachunderminesfairness.InTxOS,be-causeasymmetricconictsareoftendetectedbeforeanon-transactionalthreadentersacriticalregion,theschedulerhastheoptionofsuspendingthenon-transactionalthread,allow-ingforfairnessbetweentransactionsandnon-transactionalthreads.4.2ManagingtransactionalstateDatabasesandhistoricaltransactionaloperatingsystemstypicallyupdatedatainplaceandmaintainanundolog.Thisapproachiscalledeagerversionmanagement[25].Thesesystemsisolatetransactionsbylockingdatawhenitisac-cessedandholdingthelockuntilcommit.Thistechniqueiscalledtwo-phaselocking,anditusuallyemployslocksthatdistinguishreadandwriteaccesses.Becauseapplicationsgenerallydonothaveagloballyconsistentorderfordataac-cesses,thesesystemscandeadlock.Forexample,onethreadmightreadleAthenwriteleB,whileadifferentthreadmightreadleB,thenwriteleA.Thepossibilityofdeadlockcomplicatestheprogrammingmodelofeagerversioningtransactionalsystems.Deadlockiscommonlyaddressedbyexposingatimeoutparametertousers.Settingthetimeoutproperlyisachallenge.Ifitistooshort,itcanstarvelong-runningtransactions.Ifitistoolong,itcandestroytheperformanceofthesystem.Eagerversionmanagementdegradesresponsivenessinwaysthatarenotacceptableforanoperatingsystem.Ifaninter-rupthandler,highprioritythread,orreal-timethreadabortsatransaction,itmustwaitforthetransactiontoprocessitsundolog(torestorethepre-transactionstate)beforeitcansafelyproceed.Thiswaitjeopardizesthesystem'sabilitytomeetitstimingrequirements.TxOS,incontrast,useslazyversionmanagement,wheretransactionsoperateonprivatecopiesofadatastructure.Applicationsneverholdkernellocksacrosssystemcalls.LazyversioningrequiresTxOStoholdlocksonlylongenoughtomakeaprivatecopyoftherelevantdatastructure.Byenforcingaglobalorderingforkernellocks,TxOSavoidsdeadlock.TxOScanaborttransactionsinstantly—thewin-nerofaconictdoesnotincurlatencyfortheabortedtrans-actiontoprocessitsundolog.Theprimarydisadvantageoflazyversioningisthecommitlatencyduetocopyingtransactionalupdatesfromthespecu-lativeversiontothestableversionofthedatastructures.AswediscussinSection5,TxOSminimizesthisoverheadbysplittingobjects,turningamemcpyoftheentireobjectintoapointercopy.4.3IntegrationwithtransactionalmemorySystemtransactionsprotectsystemstate,notapplicationstate.Formulti-threadedprograms,theOShasnoefcientmechanismtosaveandrestorethememorystateofanin-dividualthread.User-leveltransactionalmemory(TM)sys-tems,however,aredesignedtoprovideefcienttransactionalsemanticstomemorymodicationsbyathread,butcannotisolateorrollbacksystemcalls.Integratinguserandsys-temtransactionscreatesasimpleandcompletetransactionalprogrammingmodel.Systemtransactionsxoneofthemosttroublesomelim-itationsoftransactionalmemorysystems—thatsystemcallsaredisallowedduringusertransactionsbecausetheyviolatetransactionalsemantics.Systemcallsontraditionaloperat-ingsystemarenotisolated,andtheycannotberolledbackifatransactionfails.Forexample,aleappendperformedinsideahardwareorsoftwareusertransactioncanoccuranarbitrarynumberoftimes.Eachtimetheuser-leveltransac-tionabortsandretries,itrepeatstheappend.OnaTMsystemintegratedwithTxOS,whenaTMap-plicationmakesasystemcall,theruntimebeginsasystemtransaction.Theuser-leveltransactionalmemorysystemhan-dlesbufferingandpossiblyrollingbacktheapplication'smemorystate,andthesystemtransactionbuffersupdatestosystemstate.Theupdatestosystemstatearecommittedorabortedbythekernelatomicallywiththecommitorabortoftheuser-leveltransaction.Theprogrammerseesthesim-pleabstractionofanatomicblockthatcancontainupdates instance,theinode_headercontainsbothlemetadata(owner,permissions,etc.)andthemappingofleblockstocachedpagesinmemory(i_data).Aprocessmayoftenreadorwritealewithoutupdatingthemetadata.TxOSver-sionstheseobjectsseparately,allowingmetadataoperationsanddataoperationsonthesameletoexecuteconcurrentlywhenitissafe.Read-onlyobjects.Manykernelobjectsareonlyreadinatransaction,suchastheparentdirectoriesinapathlookup.Toavoidthecostofmakingshadowcopies,kernelcodecanspecifyread-onlyaccesstoanobject,whichmarkstheobjectdataasread-onlyforthelengthofthetransaction.Eachdataobjecthasatransactionalreaderreferencecount.Ifawriterwinsacon-ictforanobjectwithanon-zeroreadercount,itmustcreateanewcopyoftheobjectandinstallitasthenewstablever-sion.TheOSgarbagecollectstheoldcopyviaread-copyupdate(RCU)[29]whenalltransactionalreadersreleaseitandafterallnon-transactionaltaskshavebeendescheduled.Thisconstraintensuresthatallactivereferencestotheold,read-onlyversionhavebeenreleasedbeforeitisfreedandalltasksseeaconsistentviewofkerneldata.Theonlycaveatisthatanon-transactionaltaskthatblocksmustre-acquireanydataobjectsitwasusingafterwaking,astheymayhavebeenreplacedandfreedbyatransactioncommit.Althoughitcomplicatesthekernelprogrammingmodelslightly,mark-ingdataobjectsasread-onlyinatransactionisastructuredwaytoeliminatesubstantialoverheadformemoryallocationandcopying.Specialsupportforread-mostlytransactionsisacommonoptimizationintransactionalsystems,andRCUisatechniquetosupportefcient,concurrentaccesstoread-mostlydata.5.2ConictdetectionandresolutionAsdiscussedinSection4.1,TxOSserializestransactionswithnon-transactionalactivityaswellaswithothertransac-tions.TxOSserializesnon-transactionalaccessestokernelobjectswithtransactionsbyleveragingthecurrentlockingpracticeinLinuxandaugmentingstableobjectswithinfor-mationabouttransactionalreadersandwriters.Bothtrans-actionalandnon-transactionalthreadsusethisinformationtodetectaccessesthatwouldviolateconictserializabilitywhentheyacquireakernelobject.Conictsoccurwhenatransactionattemptstowriteanob-jectthathasbeenreadorwrittenbyanothertransaction.Anasymmetricconictisdenedsimilarly:anon-transactionalthreadattemptstowriteanobjectatransactionhasreadorwritten,orviceversa.TxOSembedsatx_dataobjectintheheaderportionofallsharedkernelobjectsthatcanbeac-cessedwithinatransaction.Thetx_dataobjectincludesapointertoatransactionalwriterandareaderlist.Anon-nullwriterpointerindicatesanactivetransactionalwriter,andanemptyreaderlistindicatestherearenoreaders.Lockspreventtransactionsfromacquiringanobjectthatisconcur-rentlyaccessedbyanon-transactionalthread.Whenathreaddetectsaconict,TxOSusestheseeldstodeterminewhichtransactionsareinconict;theconictisthenarbitratedbythecontentionmanager(§5.2.1).Notethatthereaderlistisattachedtothestableheaderobject,whereasthereadercount(§5.1)isusedforgarbagecollectingobsoletedataobjects.Bylockingandtestingthetransactionalreadersandwriterelds,TxOSdetectstransactionalandasymmetricconicts.5.2.1ContentionManagementWhenaconictisdetectedbetweentwotransactionsorbetweenatransactionandanon-transactionalthread,TxOSinvokesthecontentionmanagertoresolvetheconict.Thecontentionmanageriskernelcodethatimplementsapolicytoarbitrateconictsamongtransactions,dictatingwhichoftheconictingtransactionsmaycontinue.Allotherconict-ingtransactionsmustabort.Asadefaultpolicy,TxOSadoptstheospriopolicy[43].Osprioalwaysselectstheprocesswiththehigherschedulingpriorityasthewinnerofaconict,eliminatingpriorityandpolicyinversionintransactionalconicts.Whenprocesseswiththesamepriorityconict,theoldertransactionwins(apolicyknownastimestamp[40]),guaranteeinglivenessfortransactionswithinagivenprioritylevel.5.2.2AsymmetricconictsAconictbetweenatransactionalandnon-transactionalthreadiscalledanasymmetricconict.Transactionalthreadscanalwaysbeabortedandrolledback,butnon-transactionalthreadscannotberolledback.TxOSmusthavethefree-domtoresolveanasymmetricconictinfavorofthetrans-actionalthread,otherwiseasymmetricconictswillalwayswin,underminingfairnessinthesystemandpossiblystarv-ingtransactions.Whilenon-transactionalthreadscannotberolledback,theycanoftenbepreempted,whichallowsthemtolosecon-ictswithtransactionalthreads.Kernelpreemptionisare-centfeatureofLinuxthatallowsthekerneltopreemptivelydeschedulethreadsexecutingsystemcallsinsidetheker-nel,unlesstheyareinsideofcertaincriticalregions.InTxOS,non-transactionalthreadsdetectconictswithtrans-actionalthreadsbeforetheyactuallyupdatestate,usuallywhentheyacquirealockforakerneldatastructure.Anon-transactionalthreadcansimplydescheduleitselfifitlosesaconictandisinapreemptiblestate.Ifanon-transactional,non-preemptibleprocessabortsatransactiontoomanytimes,thekernelcanstillpreventitfromstarvingthetransactionbyplacingthenon-transactionalprocessonawaitqueuethenexttimeitmakesasystemcall.Thekernelreschedulesthenon-transactionalprocessonlyafterthetransactioncommits.Linuxcanpreemptakernelthreadifthethreadisnothold-ingaspinlockandisnotinaninterrupthandler.TxOShastheadditionalrestrictionthatitwillnotpreemptaconict-ingthreadthatholdsoneormoremutexes(orsemaphores).Otherwise,TxOSrisksadeadlockwithatransactionthat Figure4:ThemajorstepsinvolvedincommittingTransactionAwithinode57initsworkset,changingthemodefrom0777to0755.Thecommitcoderstlockstheinode.Itthenreplacestheinodeheader'sdatapointertotheshadowinode.Finally,TransactionAfreesthere-sourcesusedfortransactionalbookkeepingandunlockstheinode.cientforguaranteeingthatmemoryaddressesremainun-changedforthedurationofthetransaction.5.4CommitprotocolWhenasystemtransactioncallssys_xend(),itisreadytobeginthecommitprotocol.Theowofthecommitpro-tocolisshowninFigure4.Intherststep,thetransactionacquireslocksforallitemsinitsworkset.Theworksetiskeptsortedaccordingtothekernellockingdisciplinetoen-ablefastcommitandeliminatethepossibilityofdeadlockamongcommittingtransactions.Specically,objectsaresortedbythekernelvirtualaddressoftheheader,followedbylistssortedbykernelvirtualaddress.Listsarelockedlasttomaintainanorderingwiththedirectorytraversalcode.TxOSiteratesovertheobjectstwice,oncetoacquiretheblockinglocksandasecondtimetoacquirenon-blockinglocks.TxOSiscarefultoacquireblockinglocksbeforespin-locks,andtoreleasespinlocksbeforeblockinglocks.Ac-quiringorreleasingamutexorsemaphorecancauseapro-cesstosleep,andsleepingwithaheldspinlockcandeadlockthesystem.Afteracquiringalllocks,thetransactiondoesanalcheckofitsstatuswordwithanatomiccompare-and-swapinstruc-tion.IfithasnotbeensettoABORTED,thenthetransactioncansuccessfullycommit(thisCASinstructionisthetrans-action'slinearizationpoint[23]).Thecommittingprocessholdsallrelevantobjectlocksduringcommit,therebyex-cludinganytransactionalornon-transactionalthreadsthatwouldcompeteforthesameobjects.Afteracquiringalllocks,thetransactioncopiesitsupdatestothestableobjects.Thetransaction'sbookkeepingdataareremovedfromtheobjects,thenthelocksarereleased.Betweenreleasingspinlocksandmutexes,thetransactionperformsdeferredoperations(likememoryallocations/freesanddeliveringlesystemmonitoringevents)andperformsanypendingwritestostablestorage.Duringcommit,TxOSholdslocksthatarenototherwiseheldatthesametimeinthekernel.Asaresult,TxOSex-tendsthelockingdisciplineslightly,forinstancebyrequir-ingthatrenamelocksinodesentriesinorderofkernelvir-tualaddress.TxOSalsointroducesadditionalne-grainedlockingonobjects,suchaslists,thatarenotlockedinLinux.Althoughtheseadditionalconstraintscomplicatethelock-ingdiscipline,theyalsoallowTxOStoelidecoarse-grainedlockssuchasthedcache_lock,whichprotectsupdatestothehashtableofdirectoryentriescachedinmemory.Byeliminatingthesecoarse-grainedlocks,TxOSimprovesper-formancescalabilityforindividualsystemcalls.5.5AbortProtocolIfatransactiondetectsthatitlosesaconict,itmustabort.Theabortprotocolissimilartothecommitprotocol,butsim-plerbecauseitdoesnotrequireallobjectstobelockedatonce.Ifthetransactionisholdinganykernellocks,itrstreleasesthemtoavoidstallingotherprocesses.Thetransac-tiontheniteratesoveritsworkingsetandlockseachobject,removesanyreferencestoitselffromtheobject'stransac-tionalstate,andthenunlockstheobject.Next,thetransac-tionfreesitsshadowobjectsanddecrementsthereferencecountontheirstablecounterparts.Thetransactionwalksitsundologtoreleaseanyotherresources,suchasmemoryal-locatedwithinthetransaction.5.6User-leveltransactionsInorderforauser-leveltransactionalmemorysystemtousesystemtransactions,theTMsystemmustcoordinatecom-mitofapplicationstatewithcommitofthesystemtransac-tion.ThissectionprovidescommitprotocolsforthemajorclassesofTMimplementations.5.6.1Lock-basedSTMrequirementsTxOSusesasimpliedvariantofthetwo-phasecommitprotocol(2PC)[17]tocoordinatecommitofalock-baseduser-levelsoftware(STM)transactionwithasystemtrans-action.TheTxOScommitconsistsofthefollowingsteps.1.Theuserpreparesatransaction.2.Theuserrequeststhatthesystemcommitthetransac-tionthroughthesys_xend()systemcall.3.Thesystemcommitsoraborts.4.Thesystemcommunicatestheoutcometotheuserthroughthesys_xend()returncode.5.Theusercommitsorabortsinaccordancewiththeout-comeofthesystemtransaction.Thisprotocolnaturallyfollowstheowofcontrolbetweentheuserandkernel,butrequirestheusertransactionsystemtosupportthepreparedstate.Wedeneapreparedtransac-tionasbeingnished(itwilladdnomoredatatoitsworkingset),safetocommit(ithasnotcurrentlylostanyconictswithotherthreads),andguaranteedtoremainabletocom-mit(itwillwinallfutureconictsuntiltheendoftheproto-col).Inotherwords,onceatransactionisprepared,anotherthreadmuststallorrollbackifittriestoperformaconictingoperation.Inasystemthatuseslockstoprotectacommit, prepareisaccomplishedbysimplyholdingallofthelocksrequiredforthecommitduringthesys_xend()call.Onasuccessfulcommit,thesystemcommitsitsstatebeforetheuser,butanycompetingaccessestothesharedstateareseri-alizedaftertheusercommit.DependingontheimplementationdetailsoftheuserTMimplementation,additionalintegrationeffortmayberequiredoftheSTMimplementation.Forinstance,alazyversionedSTMneedstoensurethatatransactionalwritesystemcallisissuedwiththecorrectversionofthebuffer.Asanop-timization,theSTMruntimecancheckthereturncodeonsystemcallswithinatransactiontodetectanabortedsys-temtransactionsooner.FortheTMsystemsweexamined,coordinatingcommitandaddingextrareturnchecksweresufcient.5.6.2HTMandobstruction-freeSTMrequirementsHardwaretransactionalmemory(HTM)andobstruction-freesoftwareTMsystems[22]useasingleinstruction(xendandcompare-and-swap,respectively),toperformtheircom-mits.Forthesesystems,apreparestageisunnecessary.In-stead,thecommitprotocolshouldhavethekernelissuethecommitinstructiononbehalfoftheuseroncethekernelhasvalidateditsworkset.Boththesystemanduser-leveltrans-actioncommitoraborttogetherdependingupontheresultofthisspeciccommitinstruction.ForHTMsupport,TxOSrequiresthatthehardwareallowthekerneltosuspenduser-initializedtransactionsonentrytothekernel.EveryHTMproposalthatsupportsanOS[32,43,58]supportsmechanismsthatsuspenduser-initiatedtrans-actions,avoidingthemixtureofuserandkerneladdressesinthesamehardwaretransaction.MixinguserandkerneladdresscreatesasecurityvulnerabilityinmostHTMpro-posals.Also,thekernelneedstobeabletoissueanxendinstructiononbehalfoftheapplication.ThoughTxOSsupportsuser-levelHTM,itrunsoncom-modityhardwareanddoesnotrequireanyspecialHTMsup-portitself.6.TxOSKERNELSUBSYSTEMSThissectiondiscusseshowvariouskernelsubsystemssup-portACI[D]semanticsinTxOS.Inseveralcases,transac-tionalsemanticsneednotbedevelopedfromscratch,butareimplementedbyextendingfunctionalityalreadypresentinthesubsystem.Forexample,weusethejournalinext3toprovidetrue,multi-operationdurability.WeleverageLinux'ssupportfordeferringsignaldeliverytomanagesignalssenttoandfromtransactionalthreads.6.1TransactionallesystemTxOSsimpliesthetaskofwritingatransactionallesystembydetectingconictsandmanagingversioneddatainthevirtuallesystemlayer.TheOSprovidesthetrans-actionalsemantics—versioningupdatesanddetectingcon-icts.Thelesystemneedonlyprovidetheabilitytoatom-icallycommitupdatestostablestorage(e.g.,viaajournal).Byensuringthatallcommittedchangesarewritteninasin-glejournaltransaction,weconvertedext3intoatransac-tionallesystem.Memory-onlylesystems,suchasprocandtmpfs,areautomaticallytransactionalwhenusedwithinsystemtransactions.6.2Multi-processtransactionsAdominantparadigmforUNIXapplicationdevelopmentisthecompositionofsimplebutpowerfulutilityprogramsintomorecomplextasks.Followingthispattern,applica-tionsmaywishtotransactionallyforkanumberofchildprocessestoexecuteutilitiesandwaitfortheresultstobereturnedthroughapipe.Tosupportthisprogrammingparadigminanaturalway,TxOSallowsmultiplethreadstoparticipateinthesametrans-action.Thethreadsinatransactionmayshareanaddressspace,asinamultithreadedapplication,orthethreadsmayresideindifferentaddressspaces.Threadsinthesametrans-actionshareandsynchronizeaccesstospeculativestate.Whenaprocessforksachildinsideatransaction,thechildprocessexecuteswithintheactivetransactionuntilitper-formsasys_xend()oritexits(whereanexitisconsideredanimplicitsys_xend()).Thetransactioncommitswhenalltasksinthetransactionhaveissuedasys_xend().Thismethodofprocessmanagementallowstransactionalprogramstocallhigh-levelconveniencefunctions,likesystem,toeasilycreateprocessesusingthefullcomplementofshellfunctionality.Suchexecedprogramsrunwithtransactionalsemantics,thoughtheymightnotcontainanyexplicitlytrans-actionalcode.Afterachildprocesscommits,itisnolongerpartofthetransactionandsubsequentsys_xbegin()callswillbegintransactionsthatarecompletelyindependentfromtheparent.Systemcallsthatmodifyprocessstate,forexamplebyal-locatingmemoryorinstallingsignalhandlers,arefasterintransactionallyforkedtasksbecausetheydonotcheckpointtheprocess'ssystemstate.Anabortwillsimplyterminatetheprocess;nootherrollbackisrequired.6.3SignaldeliverySignalsemanticsinTxOSprovideisolationamongthreadsindifferenttransactions,aswellasisolationbetweennon-transactionalandtransactionalthreads.Anysignalsenttoathreadnotpartofthesource'stransactionisdeferredun-tilcommitbyplacingitinadeferralqueue,regardlessofwhetherthereceivingthreadistransactional.Signalsinthequeuearedeliveredinorderifthetransactioncommits,anddiscardedifthetransactionaborts.Whenanapplicationbeginsatransaction,aagtosys_-xbegin()specieswhetherincomingsignalsshouldbedeliveredspeculativelywithinthetransaction(speculativedelivery)ordeferreduntilcommit(deferreddelivery).Spec-ulativedeliveryenablestransactionalapplicationstobemoreresponsivetoinput.Whensignalsaredeliveredspecula- chinehas161000MHzCPUs,eachwitha32KBlevel1and4MBlevel2cache.AnL1misscosts24cyclesandanL2misscosts350cycles.TheHTMusesthetimestampcontentionmanagementpolicyandlinearbackoffonrestart.7.1Single-threadsystemcalloverheadsAkeygoalofTxOSistomaketransactionsupportef-cient,takingspecialcaretominimizetheoverheadnon-transactionalapplicationsincur.Toevaluateperformanceoverheadsforsubstantialapplications,wemeasuredtheav-eragecompilationtimeacrossthreenon-transactionalbuildsoftheLinux2.6.22kernelonunmodiedLinux(3minutes,24seconds),andonTxOS(3minutes,28seconds).Thisslowdownoflessthan2%indicatesthatformostapplica-tions,thenon-transactionaloverheadswillbenegligible.Atthescaleofasinglesystemcall,however,theaverageover-headiscurrently29%,andcouldbecutto14%withim-provedcompilersupport.Table4showstheperformanceofcommonlesystemsystemcallsonTxOS.Weraneachsystemcall1milliontimes,discardingtherstandlast100,000measurementsandaveragingtheremainingtimes.Theelapsedcyclesweremeasuredusingtherdtscinstruction.ThepurposeofthetableistoanalyzetransactionoverheadsinTxOS,butitisnotarealisticusecase,asmostsystemcallsarealreadyatomicandisolated.WrappingasinglesystemcallinatransactionistheworstcaseforTxOSperformancebecausethereisverylittleworkacrosswhichtoamortizethecostofcreatingshadowobjectsandcommit.TheBasecolumnshowsthebaseoverheadfromaddingtransactionstoLinux.Theseoverheadshaveageometricmeanof3%,andareallbelow20%,includingaperfor-manceimprovementforwrite.OverheadsareincurredmostlybyincreasedlockinginTxOSandtheextraindirec-tionnecessitatedbydatastructurereorganization(e.g.,sep-arationofheaderanddataobjects).Theselowoverheadsshowthattransactionalsupportdoesnotsignicantlyslowdownnon-transactionalactivity.TxOSreplacessimplelinkedlistswithamorecomplextransactionallist(§5.2.3).Thetransactionallistallowsmoreconcurrency,bothbyeliminatingtransactionalconictsandbyintroducingne-grainedlockingonlists,attheexpenseofhighersingle-threadlatency.TheStaticcolumnaddsthelatenciesduetotransactionalliststothebaseoverheads(roughly10%,thoughmoreforlink).TheStaticcolumnassumesthatTxOScancompiletwoversionsofallsystemcalls:oneusedbytransactionalthreadsandtheotherusedbynon-transactionalthreads.OurTxOSprototypeusesdynamicchecks,whicharefrequentandex-pensive.Withcompilersupport,theseoverheadsareachiev-able.TheNoTxcolumnpresentsmeasurementsofthecurrentTxOSprototype,withdynamiccheckstodetermineifathreadisexecutingatransaction.TheBgndTxcolumnarenon-transactionalsystemcalloverheadsforTxOSwhilethereisanactivesystemtransactioninadifferentthread.Non-transactionalsystemcallsneedtoperformextraworktode-tectconictswithbackgroundtransactions.TheInTxcol-umnshowstheoverheadofthesystemcallinasystemtrans-action.Thisoverheadishigh,butrepresentsarareusecase.TheTxcolumnincludestheoverheadsofthesys_xbegin()andsys_xend()systemcalls.7.2Applicationsandmicro-benchmarksTable5showstheperformanceofTxOSonarangeofap-plicationsandmicro-benchmarks.Eachmeasurementistheaverageofthreeruns.TheslowdownrelativetoLinuxisalsolisted.Postmarkisalesystembenchmarkthatsimulatesthebehaviorofanemail,networknews,ande-commerceclient.Weuseversion1.51withthesametransactionbound-ariesasAmino[56].TheLFSsmalllebenchmarkoper-ateson10,0001024bytesles,andthelargelebenchmarkreadsandwritesa100MBle.TheReimplementedAn-drewBenchmark(RAB)isareimplementationoftheModi-edAndrewBenchmark,scaledformoderncomputers.Ini-tially,RABcreates500les,eachcontaining1000bytesofpseudo-randomprintable-ASCIIcontent.Next,thebench-markmeasuresexecutiontimeoffourdistinctphases:themkdirphasecreates20,000directories;thecpphasecopiesthe500generatedlesinto500ofthesedirectories,result-ingin250,000copiedles;theduphasecalculatesthediskusageofthelesanddirectorieswiththeducommand;andthegrep/sumphasesearchesthelesforashortstringthatisnotfoundandchecksumstheircontents.Thesizesofthemkdirandcpphasesarechosentotakeroughlysimilaramountsoftimeonourtestmachines.Inthetransactionalversion,eachphaseiswrappedinatransaction.Makewrapsasoftwarecompilationinatransaction.DpkgandInstallaresoftwareinstallationbenchmarksthatwraptheentireinstal-lationinatransaction,asdiscussedinthefollowingsubsec-tion.Acrossmostworkloads,theoverheadofsystemtransac-tionsisquitereasonable(1–2),andoftensystemtransac-tionsspeeduptheworkload(e.g.,postmark,LFSsmalllecreate,RABmkdirandcpphases).Benchmarksthatre-peatedlywritelesinatransaction,suchastheLFSlargelebenchmarksequentialwriteortheLFSsmalllecreatephase,aremoreefcientthanLinux.TransactioncommitgroupsthewritesandpresentsthemtotheI/Oschedulerallatonce,improvingdiskarmschedulingand,onext2andext3,increasinglocalityintheblockallocations.Write-intensiveworkloadsoutperformnon-transactionalwritersbyasmuchasafactorof29.7.TxOSrequiresextramemorytobufferupdates.Wesur-veyedseveralapplications'memoryoverheads,andfocushereontheLFSsmallandlargebenchmarksastworep-resentativesamples.Becausetheutilizationpatternsvaryacrossdifferentportionsofphysicalmemory,weconsiderlowmemory,whichisusedforkerneldatastructures,sep-aratelyfromhighmemory,whichcanbeallocatedtoappli- Backend Search Search Add Del Single Subtree BDB 3229 2076 203 172 LDIF 3171 2107 1032(5.1) 2458(14.3) LDIF-TxOS 3124 2042 413(2.0) 714(4.2) Table6:ThroughputinqueriespersecondofOpenLDAP'sslapdserver(higherisbetter)foraread-onlyandwrite-mostlyworkload.FortheAddandDelworkloads,theincreaseinthroughputoverBDBislistedinparentheses.TheBDBstoragemoduleusesBerkeleyDB,LDIFusesaatlewithnoconsistencyforupdates,andLDIF-TxOSaugmentstheLDIFstoragemoduleusesystemtransactionsonaatle.LDIF-TxOSprovidesthesamecrashconsistencyguaranteesasBDBwithmorethandoublethewritethroughput.agemodule(calledLDIF)tousesystemtransactions.TheOpenLDAPserversupportsanumberofstoragemodules;thedefaultisBerkeleyDB(BDB).WeusedtheSLAMDdis-tributedloadgenerationengine1toexercisetheserver,run-ninginsingle-threadmode.Table6showsthroughputfortheunmodiedBerkeleyDBstoragemodule,theLDIFstor-agemoduleaugmentedwithasimplecache,andLDIFusingsystemtransactions.The“SearchSingle”experimentexer-cisestheserverwithsingleitemreadrequests,whereasthe“SearchSubtree”columnsubmitsrequestsforallentriesinagivendirectorysubtree.The“Add”testmeasuresthrough-putofaddingentries,and“Del”measuresthethroughputofdeletions.Thereadperformance(searchsingleandsearchsubtree)ofeachstoragemoduleiswithin3%,asmostreadsareservedfromanin-memorycache.LDIFhas5–14thethroughputofBDBforrequeststhatmodifytheLDAPdatabase(addanddelete).However,theLDIFmoduledoesnotuselelocking,synchronouswritesoranyothermechanismtoen-sureconsistency.LDIF-TxOSprovidesACIDguaranteesforupdates.ComparedtoBDB,thereadperformanceissimilar,butworkloadsthatupdateLDAPrecordsusingsystemtrans-actionsoutperformBDBby2–4.LDIF-TxOSprovidesthesameguaranteesastheBDBstoragemodulewithrespecttoconcurrencyandrecoverabilityafteracrash.7.5Transactionalext3Inadditiontomeasuringtheoverheadsofdurabletransac-tions,wevalidatethecorrectnessofourtransactionalext3implementationbypoweringoffthemachineduringaseriesoftransactions.Afterthemachineispoweredbackon,wemountthedisktoreplayanyoperationsintheext3journalandrunfsckonthedisktovalidatethatitisinaconsistentstate.Wethenverifythatallresultsfromcommittedtransac-tionsarepresentonthedisk,andthatnopartialresultsfromuncommittedtransactionsarevisible.Tofacilitatescript-ing,weperformthesechecksusingSimics.Oursystemsuc-cessfullypassesover1,000trials,givingusahighdegreeof 1http://www.slamd.com/condencethatTxOStransactionscorrectlyprovideatomic,durableupdatestostablestorage.7.6EliminatingraceattacksSystemtransactionsprovideasimple,deterministicmethodforeliminatingracesonsystemresources.Toqualitativelyvalidatethisclaim,wereproduceseveralraceattacksfromrecentliteratureonLinuxandvalidatethatTxOSpreventstheexploit.WedownloadedthesymlinkTOCTTOUattackercodeusedbyBorisovetal.[6]todefeatDeanandHu'sprobabilisticcountermeasure[11].Thisattackcodecreatesmemorypres-sureonthelesystemcachetoforcethevictimtodesched-ulefordiskI/O,therebylengtheningtheamountoftimespentbetweencheckingthepathnameandusingit.ThisadditionaltimeallowstheattackertowinnearlyeverytimeonLinux.OnTxOS,thevictimsuccessfullyresiststheattackerbyreadingaconsistentviewofthedirectorystructureandopen-ingthecorrectle.Theattacker'sattempttointerposeasymboliclinkcreatesaconictingupdatethatoccursafterthetransactionalaccesscheckstarts,soTxOSputstheat-tackertosleepontheasymmetricconict.TheperformanceofthesafevictimcodeonTxOSisstatisticallyindistinguish-ablefromthevulnerablevictimonLinux.TodemonstratethatTxOSimprovesrobustnesswhilepre-servingsimplicityforsignalhandlers,wereproducedtwooftheattacksdescribedbyZalewksi[57].Therstattackisrepresentativeofavulnerabilitypresentinsendmailupto8.11.3and8.12.0.Beta7,inwhichanattackerinducesadouble-freeinasignalhandler.Thesecondattack,represen-tativeofavulnerabilityinthescreenutility,exploitslackofsignalhandleratomicity.Bothattacksleadtorootcom-promise;therstcanbexedbyusingthesigactionAPIratherthansignal,whilethesecondcannot.Wemodiedthesignalhandlersintheseattacksbywrappinghandlercodeinasys_xbegin,sys_xendpair,whichprovidessig-nalhandleratomicitywithoutrequiringtheprogrammertochangethecodetousesigaction.Inourexperiments,TxOSserializeshandlercodewithrespecttoothersystemoperations,andthereforedefeatsbothattacks.7.7ConcurrentperformanceSystemcallslikerenameandopenhavebeenusedasadhocsolutionsforthelackofgeneral-purposeatomicac-tions.Thesesystemcallshavestrongsemantics(arenameisatomicwithinalesystem),resultingincompleximple-mentationswhoseperformancedoesnotscale.Asanex-ampleinLinux,renamehastoserializeallcross-directoryrenamesonasinglele-system-widemutexbecausener-grainedlockingwouldriskdeadlock.Theproblemisnotthatperformancetuningrenameisdifcult,butitwouldsubstantiallyincreasetheimplementationcomplexityoftheentirelesystem,includingunrelatedsystemcalls.Transactionsallowtheprogrammertocombinesimpler capturethesemanticsofcontainerobjects,suchasadirec-tory.Multipletransactionscanconcurrentlyandsafelycre-atelesinthesamedirectorysolongasnoneofthemusethesamelenameorreadthedirectory.Unfortunately,creatingaleinthesesystemsrequiresawritelockonthedirec-tory,whichserializesthewritingtransactionsandeliminatesconcurrency.Tocompensateforthepoorperformanceofreader-writerlocks,bothsystemsallowdirectorycontentstochangeduringatransaction,whichreintroducesthepossi-bilityoftheTOCTTOUraceconditionsthatsystemtrans-actionsoughttoeliminate.Incontrast,TxOSimplementssystemtransactionswithlazyversionmanagement,moreso-phisticatedcontainers,andasymmetricconictdetection,al-lowingittoprovidehigherisolationlevelswhileminimizingperformanceoverhead.TransactionalMemory.TransactionalMemory(TM)systemsprovideamecha-nismtoprovideatomicandisolatedupdatestoapplicationdatastructures.Transactionalmemoryisimplementedeitherasmodicationstocachecoherencehardware(HTM)[19,32],insoftware(STM)[12],orasahybridofthetwo[8,10].Volosetal.[54]extendtheIntelSTMcompilerwithxCalls,whichsupportdeferralorrollbackofcommonsystemcallswhenperformedinamemorytransaction.BecausexCallsareimplementedinasingle,user-levelapplication,theycan-notisolatetransactioneffectsfromkernelthreadsindiffer-entprocesses,ensuredurableupdatestoale,orsupportmulti-processtransactions,allofwhichareneededtoper-formatransactionalsoftwareinstallationandaresupportedbyTxOS.ThesystemtransactionssupportedbyTxOSsolveafun-damentallydifferentproblemfromthosesolvedbyTxLinux[43].TxLinuxisavariantofLinuxthatuseshard-waretransactionalmemoryasasynchronizationprimitivetoprotectOSdatastructureswithinthekernel,whereasTxOSexportsatransactionalAPItouserprograms.Thetech-niquesusedtobuildTxLinuxenforceconsistencyforker-nelmemoryaccesseswithinshortcriticalregions.How-ever,thesetechniquesareinsufcienttoimplementTxOS,whichmustguaranteeconsistencyacrossheterogeneoussys-temresources,andwhichmustsupportsystemtransactionsspanningmultiplesystemcalls.TxLinuxrequireshardwaretransactionalmemorysupport,whereasTxOSrunsoncur-rentlyavailablecommodityhardware.Speculator.Speculator[35]appliesanisolationandrollbackmecha-nismtotheoperatingsystemthatisverysimilartotrans-actions,allowingthesystemtospeculatepasthigh-latencyremotelesystemoperations.ThetransactionalsemanticsTxOSprovidestouserprogramsisamorecomplicateden-deavor.InTxOS,transactionsmustbeisolatedfromeachother,whileSpeculatorisdesignedforapplicationstosharespeculativeresultswhentheyaccessthesamedata.Spec-ulatordoesnoteliminateTOCTTOUvulnerabilities.Ifa Feature Amino TxF Valor TxOS Lowoverheadkernelimplementation No Yes Yes Yes Canberootfs? No Yes Yes Yes Frameworkfortransactionalizingotherlesystems No No2 Yes Yes Simpleprogrammerinterface Yes No No Yes Otherkernelresourcesinatransaction No Yes3 No Yes Table8:Asummaryoffeaturessupportedbyrecenttransactionallesystems.TOCTTOUattackoccurredinSpeculator,theattackerandvictimwouldbepartofthesamespeculation,allowingtheattacktosucceed.Speculatorhasbeenextendedtoparal-lelizesecuritychecks[36]andtodebugsystemcongura-tion[50],butdoesnotprovideACIDsemanticsforuser-delimitedspeculation,andisthusinsufcientforapplica-tionslikeatomicsoftwareinstallation/update.Transactionallesystems.TxOSsimpliesthetaskofwritingatransactionallesys-tembydetectingconictsandversioningdatainthevirtuallesystemlayer.SomepreviousworksuchasOdeFS[15],Inversion[38],andDBFS[34]providealesysteminterfacetoadatabase,implementedasauser-levelNFSserver.Thesesystemsdonotprovideatomic,isolatedupdatestolocaldisk,andcannotaddresstheproblemofcoordinatingaccesstoOS-managedresources.BerkeleyDBandStasis[46]aretransactionallibraries,notlesystems.Amino[56]supportstransactionalleoperationsemanticsbyinterposingonsys-temcallsusingptraceandrelyingonauser-leveldatabasetostoreandmanagelesystemdataandmetadata.Otherlesystemsimplementalltransactionalsemanticsdirectlyinthelesystem,asillustratedbyValor[49],TransactionalNTFS(alsoknownasTxF)[44],andothers[14,45,47].Table8listsseveraldesirablepropertiesforatransactionallesystemandcomparesTxOSwithrecentsystems.Be-causeAmino'sdatabasemustbehostedonanativelesys-tem,itcannotbeusedastherootlesystem.TxFcanbeusedastherootlesystem,buttheprogrammermustensurethatthelocalsystemisthetwo-phasecommitcoordinatorifitparticipatesinadistributedtransaction.LikeTxOS,Valorprovideskernelsupportinthepagecachetosimplifythetaskofaddingtransactionstonewlesys-tems.Valorsupportstransactionslargerthanmemory,whichTxOScurrentlydoesnot.Valorprimarilyprovidesloggingandcoarse-grainedlockingforles.Becausedirectoryoper-ationsrequirelockingthedirectory,Valor,likeQuickSilver,ismoreconservativethannecessarywithrespecttoconcur-rentdirectoryupdates.InadditiontoTxF,WindowsVistaintroducedatransac- 2Windowsprovidesakerneltransactionmanager,whichcoordi-natescommitsacrosstransactionalresources,buteachindividuallesystemisstillresponsibleforimplementingcheckpoint,roll-back,conictdetection,etc.3Windowssupportsatransactionalregistry. Alesysteminterfacetoanobject-orienteddatabase.InVLDB,1994.[16]S.Ghemawat,H.Gobioff,andS.-T.Leung.Thegooglelesystem.SOSP,2003.[17]J.Gray.Notesondatabaseoperatingsystems.InOperatingSystems,AnAdvancedCourse.Springer-Verlag,1978.[18]J.GrayandA.Reuter.TransactionProcessing:ConceptsandTechniques.MorganKaufmann,1993.[19]L.Hammond,V.Wong,M.Chen,B.Carlstrom,J.Davis,B.Hertzberg,M.Prabhu,H.Wijaya,C.Kozyrakis,andK.Olukotun.Transactionalmemorycoherenceandconsistency.InISCA,June2004.[20]R.Haskin,Y.Malachi,andG.Chan.RecoverymanagementinQuickSilver.ACMTrans.Comput.Syst.,6(1):82–108,1988.[21]M.HerlihyandE.Koskinen.Transactionalboosting:Amethodologyforhighly-concurrenttransactionalobjects.InPPoPP,2008.[22]M.Herlihy,V.Luchangco,M.Moir,andI.WilliamN.Scherer.Softwaretransactionalmemoryfordynamic-sizeddatastructures.InPODC,2003.[23]M.P.HerlihyandJ.M.Wing.Linearizability:acorrectnessconditionforconcurrentobjects.ACMTOPLAS,12(3),1990.[24]M.Kulkarni,K.Pingali,B.Walter,G.Ramanarayanan,K.Bala,andL.P.Chew.Optimisticparallelismrequiresabstractions.InPLDI,NewYork,NY,USA,2007.ACMPress.[25]J.LarusandR.Rajwar.TransactionalMemory.Morgan&Claypool,2006.[26]B.Liskov,D.Curtis,P.Johnson,andR.Scheifer.ImplementationofArgus.SOSP,1987.[27]P.Magnusson,M.Christianson,andJ.E.etal.Simics:Afullsystemsimulationplatform.InIEEEComputer,Feb2002.[28]P.McDougall.Microsoftpullsbuggywindowsvistasp1les.InInformationWeek.http://www.informationweek.com/story/showArticle.jhtml?articleID=206800819.[29]P.E.McKenney.ExploitingDeferredDestruction:AnAnalysisofRead-CopyUpdateTechniquesinOperatingSystemKernels.PhDthesis,2004.[30]Microsoft.Whatissystemrestore.2008.http://support.microsoft.com/kb/959063.[31]C.C.Minh,J.Chung,C.Kozyrakis,andK.Olukotun.Stamp:Stanfordtransactionalapplicationsformulti-processing.InIISWC,2008.[32]K.E.Moore,J.Bobba,M.J.Moravan,M.D.Hill,andD.A.Wood.LogTM:Log-basedtransactionalmemory.InHPCA,2006.[33]M.J.Moravan,J.Bobba,K.E.Moore,L.Yen,M.D.Hill,B.Liblit,M.M.Swift,andD.A.Wood.SupportingnestedtransactionalmemoryinLogTM.InASPLOS,2006.[34]N.Murphy,M.Tonkelowitz,andM.Vernal.Thedesignandimplementationofthedatabaselesystem,2002.[35]E.B.Nightingale,P.M.Chen,andJ.Flinn.Speculativeexecutioninadistributedlesystem.InSOSP,2005.[36]E.B.Nightingale,D.Peek,P.M.Chen,andJ.Flinn.Parallelizingsecuritychecksoncommodityhardware.InASPLOS,2008.[37]NIST.NationalVulnerabilityDatabase.http://nvd.nist.gov/,2008.[38]M.A.Olson.Thedesignandimplementationoftheinversionlesystem.InUSENIX,1993.[39]W.Pugh.Skiplists:aprobabilisticalternativetobalancedtrees.CommunicationsoftheACM,33:668–676,1990.[40]R.RajwarandJ.R.Goodman.Transactionallock-freeexecutionoflock-basedprograms.ASPLOS,2002.[41]H.Ramadan,C.Rossbach,D.Porter,O.Hofmann,A.Bhandari,andE.Witchel.MetaTM/TxLinux:Transactionalmemoryforanoperatingsystem.InISCA,2007.[42]H.E.Ramadan,I.Roy,M.Herlihy,andE.Witchel.CommittingconictingtransactionsinanSTM.PPoPP,2009.[43]C.Rossbach,O.Hofmann,D.Porter,H.Ramadan,A.Bhandari,andE.Witchel.TxLinux:Usingandmanagingtransactionalmemoryinanoperatingsystem.InSOSP,2007.[44]M.RussinovichandD.Solomon.WindowsInternals.MicrosoftPress,2009.[45]F.SchmuckandJ.Wylie.ExperiencewithtransactionsinQuickSilver.InSOSP.ACM,1991.[46]R.SearsandE.Brewer.Stasis:Flexibletransactionalstorage.InOSDI,2006.[47]M.I.Seltzer.Transactionsupportinalog-structuredlesystem.In9thInternationalConferenceonDataEngineering,1993.[48]A.Z.Spector,D.Daniels,D.Duchamp,J.L.Eppinger,andR.Pausch.Distributedtransactionsforreliablesystems.InSOSP,1985.[49]R.Spillane,S.Gaikwad,M.Chinni,E.Zadok,andC.P.Wright.Enablingtransactionalleaccessvialightweightkernelextensions.FAST,2009.[50]Y.-Y.Su,M.Attariyan,andJ.Flinn.Autobash:improvingcongurationmanagementwithoperatingsystemcausalityanalysis.InSOSP,2007.[51]D.Tsafrir,T.Hertz,D.Wagner,andD.D.Silva.Portablypreventingleraceattackswithuser-modepathresolution.Technicalreport,IBMResearchReport,2008.[52]D.Tsafrir,T.Hertz,D.Wagner,andD.D.Silva.