/
Tamper Evident Microprocessors Adam Waksman Department Tamper Evident Microprocessors Adam Waksman Department

Tamper Evident Microprocessors Adam Waksman Department - PDF document

min-jolicoeur
min-jolicoeur . @min-jolicoeur
Follow
432 views
Uploaded On 2015-06-13

Tamper Evident Microprocessors Adam Waksman Department - PPT Presentation

columbiaedu Simha Sethumadhavan Department of Computer Science Columbia University New York USA simhacscolumbiaedu Abstract Most security mechanisms proposed to date unques tioningly place trust in microprocessor hardware This trust however is mispla ID: 85220

columbiaedu Simha Sethumadhavan Department

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Tamper Evident Microprocessors Adam Waks..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

TamperEvidentMicroprocessorsAdamWaksmanDepartmentofComputerScienceColumbiaUniversityNewYork,USAwaksman@cs.columbia.eduSimhaSethumadhavanDepartmentofComputerScienceColumbiaUniversityNewYork,USAsimha@cs.columbia.eduAbstract—Mostsecuritymechanismsproposedtodateunques-tioninglyplacetrustinmicroprocessorhardware.Thistrust,however,ismisplacedanddangerousbecausemicroprocessorsarevulnerabletoinsiderattacksthatcancatastrophicallycom-promisesecurity,integrityandprivacyofcomputersystems.Inthispaper,wedescribeseveralmethodstostrengthenthefunda-mentalassumptionabouttrustinmicroprocessors.Byemployingpractical,lightweightattackdetectorswithinamicroprocessor,weshowthatitispossibletoprotectagainstmaliciouslogicembeddedinmicroprocessorhardware.Weproposeandevaluatetwoarea-efcienthardwaremethods—TRUSTNETandDATAWATCH—thatdetectattacksonmicroprocessorhardwarebyknowledgeable,maliciousinsiders.Ourmechanismsleveragethefactthatmultiplecomponentswithinamicroprocessor(e.g.,fetch,decodepipelinestageetc.)mustnecessarilycoordinateandcommunicatetoexecuteevensimpleinstructions,andthatanyattackonamicroprocessormustcauseerroneouscommunicationsbetweenmicroarchitec-turalsubcomponentsusedtobuildaprocessor.AkeyaspectofoursolutionisthatTRUSTNETandDATAWATCHarethemselveshighlyresilienttocorruption.Wedemonstratethatunderrealisticassumptions,oursolutionscanprotectpipelinesandon-chipcachehierarchiesatnegligibleareacostandwithnoperformanceimpact.CombiningTRUSTNETandDATAWATCHwithpriorworkonfaultdetectionhasthepotentialtoprovidecompletecoverageagainstalargeclassofmicroprocessorattacks. 1 IndexTerms—hardwaresecurity,backdoors,microprocessors,securitybasedoncausalstructureanddivisionofwork.I.INTRODUCTIONOneofthekeychallengesintrustworthycomputingisestablishingtrustinthemicroprocessorsthatunderlieallmodernIT.Therootoftrustinallsoftwaresystemsrestsonmicroprocessorsbecauseallsoftwareisexecutedbyamicroprocessor.Ifthemicroprocessorcannotbetrusted,nosecurityguaranteescanbeprovidedbythesystem.Providingtrustinmicroprocessors,however,isbecomingincreasinglydifcultbecauseofeconomic,technologicalandsocialfac-tors.Increasinguseofthird-party“soft”intellectualpropertycomponents,theglobalscopeofthechipdesignprocess,increasingprocessordesigncomplexityandintegration,thegrowingsizeofprocessordesignteamsandthedependenceonarelativelysmallnumberofdesignersforasub-component,allmakehardwarehighlysusceptibletomaliciousdesign.1AppearsinProceedingsofthe31stIEEESymposiumonSecurity&Privacy(Oakland),May2010Freetodistributeforeducationaluse.Copyrightrestrictionsmayapplyotherwise.Asufcientlymotivatedadversarycouldintroducebackdoorsduringhardwaredesign.Forinstance,ahardwaredesigner,bychangingonlyafewlinesofVerilogcode,caneasilymodifyanon-chipmemorysystemtosenddataitemsitreceivestoashadowaddressinadditiontotheoriginaladdress.Suchbackdoorscanbeusedinattackingcondentialitye.g.,byexltratingsensitiveinformation,integritye.g.,bydisablingsecuritycheckssuchasmemoryprotection,andavailabilitye.g.,byshuttingdownthecomponentbasedonatimeroranexternalsignal.Somerecenthigh-proleattackshavebeenat-tributedtountrustworthymicroprocessors[ 10 ];hardwaretrustissueshavebeenaconcernforawhilenowinseveraldomains,includinginmilitaryandpublicsafetyequipment[ 67 ],andthisissuehasattractedmediaattentionlately[ 45 ].Becausehardwarecomponents(includingbackdoors)arearchitecturallypositionedatthelowestlayerofacomputa-tionaldevice,itisverydifculttodetectattackslaunchedorassistedbythosecomponents:itistheoreticallyimpossible 2 todosoatahigherlayere.g.,attheoperatingsystemorapplication,andthereislittlefunctionalityavailableincurrentprocessorsandmotherboardstodetectsuchmisbehavior.Thestateofpracticeistoensurethathardwarecomesfromatrustedsourceandismaintainedbytrustedpersonnel—avirtualimpossibilitygiventhecurrentdesignandmanufac-turingrealities.Infact,ourinabilitytocatchaccidentalbugswithtraditionaldesignandvericationprocedures,eveninhigh-volumeprocessors[ 59 ],makesitunlikelythathiddenbackdoorswillbecaughtusingthesameprocedures,asthisisanevenmorechallengingtask. 3 Inthispaperweinvestigatehowmicroprocessortrustcanbestrengthenedwhenmanufacturedviaanuntrusteddesignow.Figure 1 showsthestandardstepsusedtomanufacturemicroprocessors.Thispaperfocusesononeoftheinitialproductionsteps,whichisthecodingphaseofhardwaredesign(registertransferlevel,orRTL).Anybackdoorintroducedduringtheinitialphasebecomesprogressivelymoredifculttocatchasitpercolatesthroughoptimizationsandtoolsinthe2Itshouldbenoted,however,thatinpracticeitmaybepossibletodetectdiscrepanciesinthestateofthesystem,suchascachemisses.Suchdetectioncannotbeguaranteed,anditlargelydependsonbothexternalartifactsusedforthedetection(e.g.,areferencetimesource)andonsub-optimalimplementationofthebackdoor.3TheInternationalTechnologyRoadmapforSemiconductorsnotesthatthenumberofbugsescapingtraditionalauditprocedureswillincreasefromvetonineper100,000linesofcodeinthecomingyears[ 2 ]. !"#$%&%$'(%)*+, -%./01#2#1,3#+%.*, 3#+%.*,4'1%5'(%)*, 6/7+%$'1,3#+%.*,81)901#2#1:, ;'"#)(,'*5,='�.15;⚘?%$'(%)*, 3#"1)7@#*(, !"#$"%&$"'(%)*+*,+#$-%$.%/0,'1$$"2%#-2*"+*1%/3%&#x:%9=;&#x:%=0;405#,#$62%.$6-1"#*2%789:%8;:%8!"#$%&'!%0-1%()%)*)%+,%1*+*,+2%/0,'1$$"2%#-2*"+*1%/3%405#,#$62%,C#D%1*2#E-*"2%F-$%D"#$"%2$56+#$-G-%)./)"/012+"34"3+5$$3"0(5$26.07"3+5/#"5 Fig.1.Microprocessordesignowandscopeofthispaper.laterphases.Priorworkondetectingattacksonhardwarebymaliciousfoundries[ 12 ][ 17 ][ 16 ][ 24 ][ 40 ][ 53 ][ 67 ]assumesasastartingpointtheavailabilityofatrustedRTLmodel,calledagoldennetlist.Ourworkaimstoprovidethistrusted,goldennetlist.Thetraditionalapproachtobuildingtrustworthysystemsfromuntrustworthycomponentsistoredundantlyperformacomputationonseveraluntrustworthycomponentsandusevotingtodetectfaultybehavior.Forexample,Nprocessorsdesignedbydifferentdesignerscanrunthesameinstructions,andthemostpopularoutputcanbeaccepted.Thissolution,however,isnotviableformicroprocessorsbecauseitincreasestheinitialdesigncostsignicantlybyincreasingthesizeofthedesignteamandvericationcomplexityofthedesign.Thissolutionalsoincreasestherecurringoperationalcostsbydecreasingperformanceandincreasingpowerconsumption.Inthispaper,wedescribeanovelmethodforbuildingatrustworthymicroprocessor(atlowcost)fromuntrustedparts,withouttheduplicationrequiredbytheNversionmodel.Ourtechniqueexploitsthestandarddivisionofworkbe-tweendifferentsub-components(orunits)withinamicropro-cessor,universallyavailableinmicroprocessordesigns.Wedothisbyrecognizingsimplerelationshipsthatmustholdbetweenon-chipunits.Theunderlyingobservationthatdrivesourtechniqueisthattheexecutionofanyinstructioninamicroprocessorconsistsofaseriesofseparatebuttightlycoupledmicroarchitecturalevents.Forexample,amemoryinstruction,inadditiontousingacacheunitneedstousethefetch,decodeandregisterunits.Wetakeadvantageofthiscooperationinordertodetecttamperingbynoticingthatifoneunitmisbehaves,theentirechainofeventsisaltered.Weexplainourtechniquewithananalogy:say,Alice,BobandChrisareinvolvedinafundraiser.AliceistheChiefFinancialOfcer,Chrisisadonor,andBobisamaliciousaccountant.LetussayChrismakesadonationof$100towardsthefund-raiserandmakesthepaymenttoBob.LetusalsosayAlicefollowsallprobabledonorsonTwittersothatshecansendathankyounoteassoonasdonorsposttweetsontheircharitabledeeds.Christweets:“Donated$100tocharity.”MaliciousBobswipes$10offandreportstoAlicethatChrisonlydonated$90.Ofcourse,AlicecatchesBobbecauseshecanpredictBob'soutputbasedonBob'sinputfromChris.Applyingthisanalogytoourmicroprocessor,amaliciouscacheunitcannotsendtwooutputswheninfactonlyonememorywriteinstructionhasbeendecoded.Anyunitthatobservestheoutputoftheinstructiondecoderandoutputofthecachewillbeabletotellthattamperinghashappenedalongtheway.Ourmethodreliesonthefactthatcooperatingunitsarenotsimultaneouslylying—areasonableassumptionbecausehigh-leveldesignengineersonamicroprocessorprojectaretypicallyresponsibleforonlyoneorfewprocessorunitsbutnotall[ 26 , 46 ].Usingtheserelationships,oursystem,calledTRUSTNET,isabletoprovideresilienceagainstattackstoanyoneunit,evenifthatunitisapartofTRUSTNETitself.Further,TRUSTNETdoesnotrequirethatanyspecicunitistrusted.Asecondsystem,calledDATAWATCH,watchesselectdataonthechipinordertoprotectagainstattacksthatalterdatavalueswithoutdirectlychangingthenumberofoutputs.Continuingonthepreviousanalogy,thiswouldbeacasewhereBob,theevilaccountant,passedonthefull$100,butpassedonCanadiandollarsinsteadofAmericandollars,keepingthedifferenceforhimself.WhenDATAWATCHisactive,Chris'tweetwouldcontainthefactthathedonatedAmericandollars,tippingoffAliceaboutBob'scrime.Inthispaper,weevaluatetheresiliencyofTRUSTNETandDATAWATCHagainstasetofattacksimplementableinRTLduringtheinitialprocessordesignsteps.WeshowthatTRUSTNETandDATAWATCHprotectthepipelineandcachememorysystemsforamicroprocessorcloselymatchingtheSunMicrosystems'OpenSPARCT2processoragainstalargeclassofattacksatthecostofnegligiblestorage(lessthan2KBpercore)andnoperformanceloss.Additionally,TRUSTNETandDATAWATCH,inconcertwithpre-existingsolutions(partialduplication[ 25 ]),canprovidecoverageagainstmanyknownhardwaredesignlevelbackdoors.Insummary,theprimarycontributionsofthispaperare:Wepresentataxonomydescribingtheattackspaceformicroprocessordesigns.Thekeyobservationthatformsthebasisofthistaxonomyisthatamicroprocessorattackcanonlychangethenumberofinstructionsorcorruptinstructions.Wepresentanovel,generalsolutionthatexploitsthedivisionofworkandcausalstructureofeventsinherentinmicropro-cessorsfordetectingalargeclassofattackscreatedduringtheinitialstagesofmicroprocessordesignbyknowledgeable,venal,maliciousinsiders.Tothebestofourknowledge,wearethersttoproposeusingviolationofco-operationinvariantsinamicroprocessorstodetectmaliciousattacks.Therestofthepaperisorganizedasfollows:Section II describesrelatedwork.Section III describesthethreatmodel,assumptionsofourstudyandataxonomyofattacks.InSection IV wedescribeoursolution.Section V presents2 !"#$%"#&'()#&"*+,"-./.01+'2&+.34&#,"-./.01+'5014$#674*#1+*&$,./#08#0/&++0# 74*#1+*&$'9&#.8)&#"-+' 74*#1+*&$:0;;14./"*.049&#.8)&#"-+'"4$',./#08#0/&++0# 74*#1+*&$(00-/)".4'!"#$%&"'()*'+,-.%&"'/01,"%2+%&320+"4%&'5(02%+67'+&82'%()09:;=%%4+ɗ.;饰()09?@!"#$%&"'()*+,&(*-"./!"#$%&"'()9:A/0:B/0:C/0AD/0D;/0@?/0BEF"0ɗ.;饰'"G'04%%4+ɗ.;饰H("#$%&"'(!"#$%&"'()I,$(%2801,"+2((",0J"8$#2(K23&+204$%L2'%&+4%&"'0M2+L4'&(M(6 Fig.2.Proposedworkinthecontextofbroaderworkonhardwarethreats.Priorcountermeasuresagainsthardwarethreatsrelyonatrustedmicroprocessorwhichthisworkaimstoprovide.evaluation.WeconcludeandpresentdirectionsforfutureresearchinSection VI .II.RELATEDORKMicroprocessorsareonepartofalargeecosystemofhardwarepartsthatformsthetrustedcomputingbase.Therehasbeenasignicantamountofworkoverthepastseveraldecadesonprotectingdifferentaspectsoftheecosystem(Fig-ure 2 ).Inthissection,wediscussthreatsandcountermeasuresagainstallclassesofhardware,notjustmicroprocessors.Sofarhardware,collectivelytheprocessor,memory,Net-workInterfaceCards,andotherperipheralandcommunicationdevices,hasbeenprimarilysusceptibletotwotypesofattacks:(1)non-invasiveside-channelattacksand(2)invasiveattacksthroughexternaluntrustedinterfaces/devices.Wedeneanattackasanyhumanactionthatintentionallycauseshardwaretodeviatefromitsexpectedfunctionality.Physicalside-channelattackscompromisesystemsbycap-turinginformationaboutprogramexecutionbyanalyzingem-anationssuchaselectromagneticradiation[ 31 , 33 , 42 , 47 , 53 ]oracousticsignals[ 15 , 44 , 60 ]whichoccurnaturallyasabyproductofcomputation.Theseattacksareaninstanceofcovertchannels[ 39 ]andwereinitiallyusedtolaunchattacksagainstcryptographicalgorithmsandartifacts(suchas“tamper-proof”smartcards[ 43 ][ 37 ])butgeneral-purposeprocessorsarealsopregnabletosuchattacks.Therehavebeenseveralattacksthatexploitweaknessesincaches[ 5 , 8 , 19 , 21 , 48 , 49 , 50 , 51 , 51 , 52 ]andbranchpredic-tion[ 6 , 7 , 9 ].Somecountermeasuresagainstthesethreatsincludeself-destructingkeys[ 32 , 35 , 62 , 72 ]andnewcircuitstylesthatconsumethesameoperationalpowerirrespectiveofinputvalues[ 27 , 38 , 58 , 64 , 65 ]andmicroarchitecturaltechniques[ 11 , 22 , 63 , 66 , 69 ].Invasiveuntrusteddeviceattackstypicallyarecarriedoutbyknowledgeableinsiderswhohavephysicalaccesstothedevice.Theseinsidersmaybeabletochangethecongurationofthehardwarecausingsystemmalfunction.ExamplesofsuchattacksincludechangingthebootROM,RAM,DiskormoregenerallyexternaldevicestobootacompromisedOSwithbackdoorsorstealingcryptographickeysusingunprotectedJTAGports[ 13 ][ 56 ].Acountermeasureistostoredatainencryptedforminuntrusted(hardware)entities.Sincethe`80stherehasbeensignicantworkinthisarea[ 61 ].Secureco-processors[ 28 , 35 ]andTrustedPlatformModules[ 4 ]havebeenusedtosecurethebootprocess.Morerecently,enabledbyVLSIadvances,researchershaveproposedcontinuouspro-tectionofprogramsandon-chipmethodsforcommunicationwithmemoryandI/Ointegration[ 29 , 40 ].Anewthreatthathasrecentlyseenaurryofactivityisintentionalbackdoorsinhardware.AshardwaredevelopmentcloselyresemblessoftwaredevelopmentbothinitsglobalscopeandliberaluseofthirdpartyIP,thereisgrowinginterestandconcerninhardwarebackdoorsandtheirapplicationstocyberoffenseanddefense.Broadlyspeaking,workinthisareacanfallintooneofthreecategories:threatsandcountermea-suresagainstmaliciousdesigners,threatsandcountermeasuresagainstmaliciousdesignautomationtools,andthreatsandcountermeasuresagainstmaliciousfoundries.Therehasbeensomeworkondetectingbackdoorsinsertedbymaliciousfoundriesthattypicallyrelyonside-channelinformationsuchaspowerfordetection[ 12 , 16 , 17 , 24 , 41 , 54 , 57 , 70 ].Therehasbeennoworkonprovidingcountermeasuresagainstmaliciousdesigners,whichthisworkaimstoaddress.Therehavebeenafewunconrmedincidentsofdesign-levelhardwareattacks[ 10 ]andsomeworkinacademiaoncreatinghardwarebackdoors.Shamiretal.[ 20 ]demonstratehowtoexploitbugsinthehardwareimplementationofinstructions.Kingetal.[ 36 ]proposeamaliciouscircuitthatcanbeembeddedinsideageneral-purposeCPUandcanbeleveragedbyattacksoftwareexecutingonthesamesystemtolaunchavarietyofattacks.Theydemonstrateanumberofsuchhybridsoftware/hardwareattacks,whichoperateatamuchhigherabstractionlevelthanwouldgenerallybepossiblewithahardware-onlyattack.Althoughtheydonotdiscussanyprotectionordetectiontechniques,theirworkisparticularlyilluminatingindemonstratingthefeasibilityandeaseofcre-atingsuchattacksthroughconcreteconstructs.III.THREATMODELAmalicioushardwaredesignerhastobestrategicincre-atingbackdoorsbecauseprocessordevelopment,especiallycommercialdevelopment,isacarefullycontrolledprocess.Broadlyspeaking,theattackerhastofollowtwosteps:rst,designabackdoorforanattack,andsecond,buildatriggerfortheattack.Justlikeregulardesign,theattackerhastohandletrade-offsregardingdegreesofdeception,timetocompletion,vericationcomplexity,andprogrammability.Inthissectionwediscussthesetradeoffsforattacktriggers(Section III-B )andattackbackdoors(Section III-C ).However,webeginourdiscussionbydetailingassumptionsinourthreatmodel.A.AssumptionsAssumption#1:DivisionofWorkTypically,amicroproces-sorteamisorganizedintosub-teams,andeachsub-teamisresponsibleforaportionofthedesign(e.g.,fetchunitorload-storeunit).Microprocessordesignisahighlycooperativeandstructuredactivitywithtenstohundredsofparticipants[ 14 ].ThelatestIntelAtomProcessor,forinstance,isreportedto3 havehad205“FunctionalUnitBlocks”[ 3 ];adesignofarecentSystem-on-ChipproductfromSTMicroelectronicsisreportedtohaverequiredover200engineershierarchicallyorganizedintoeightunits[ 1 ].Weassumethatanysub-unitteaminadesigncanbeadversarialbutthatnotmorethanoneofthesub-unitscanbesimultaneouslycompromised.Whileadversarialnation-statescouldpossiblybuyoutcompleteteamstocreateundetectablemaliciousdesigns,itismorelikelythatattackerswillbeasmallnumberof“badapples.”Assumption#2:AccessThefocusofthisworkistodetectthehandiworkofmaliciousmicroprocessordesigners,whichincludeschiparchitects,microarchitects,RTLengineersandveriers,andcircuitdesigners.Theseworkershaveapprovedaccesstothedesign,privilegetochangethedesign,andanintricateknowledgeofthemicroprocessordesignprocessanditsworkings.Amaliciousdesignerwillbeabletoprovisionforthebackdooreitherduringthespecicationphase,e.g.,byallocating“reservedbits”forunnecessaryfunctions,orbychangingtheRTL.Weassumethiswillbeunnoticedduringtheimplementationphaseandafterthecodereviewsarecomplete.Ourassumptionthatcodeauditswillnotbeabletocatchallbackdoorsisjustiedbecauseauditsarenotsuccessfulatcatchingallinadvertent,non-maliciousdesignbugs.Assumption#3:ExtentofChangesThemaliciousdesignerisabletoinsertabackdoor:(i)usingonlylowtensofbitsofstorage(latches/opsetc.)(ii)withaverysmallnumberoflogicgatesand(iii)withoutcyclelevelre-pipelining.Thisassumptiondoesnotrestrictthetypesofattacksallowed.How-ever,weassumetheattackeriscleverenoughtoimplementthechangesinthisway.Thisassumptionensuresthatthema-liciousdesignercanslipinthehardwarebackdoorunnoticedpasttraditionalauditmethodswithveryhighprobability.Assumption#4:TriggersAlthoughanunintentionalbugcanhavethesameconsequencesasamaliciousbackdoor,acriticaldifferenceisthatunlikeabug,abackdoormaynotbealwaysactive.Ifthebackdoorisalwaysactive,thereisahighchanceofdetectionduringrandom,unit-leveldesigntesting.Toavoiddetection,themaliciousdesignerislikelytocarefullycontrolwhenthebackdooristriggered.Assumption#5:ROMsWeassumethatROMswrittenduringthemicroprocessordesignphasecontaincorrectdata.Inparticular,weassumethatmicrocodedinformationiscorrect.ThereasonforthisassumptionisthatthedatainROMsisstaticallydeterminedandnotalteredbytheprocessor'sstate.Forthisreason,weconsiderthissecurityissuetobebettersolvedstaticallythanatruntime.B.AttackTriggersAnRTLlevelattackercanusetwogeneralstrategiesfortriggeringanattack:atime-basedtriggeroradata-basedtrigger.FromtheRTLperspective,inputdataandthepassageoftimearetheonlyfactorsdeterminingthestateofthemicroprocessor(anyattackusingenvironmentalfactorswouldbeaside-channelattack;weareconcernedwithattacksusingdigitalinputsignals),sothesetwostrategiesorsomecombinationofthemaretheonlyonespossible.Trigger#1:CheatCodes(CC)Amaliciousdesignercanuseasequenceofuncommonbits,embeddedineithertheinstructionordatastreamtounlock/lockthebackdoor.Forinstance,astoreinstructiontoaspecicaddressandacertainvalue(onepairingina2128spacefora64-bitmicroprocessor)canbeusedasakeytounlockabackdoor.Sincethesearchspaceissolarge,thechancethatthistriggerishitbyrandomvericationisnegligible.Kingetal.describeavariantofthisattackinwhichasequenceofinstructionsinaprogramunlocksatrigger.TheCCmethodgivesanattackeraveryhighdegreeofcontrolonthebackdoorbutmayrequireareasonablysophisticatedstatemachinetounlockthebackdoor.Further,itrequiresexecutionofsoftwarethatmaynotbepossibleduetoaccessrestrictions.Thisisduetothefactthatinordertoensurethe`magic'instruction(s)isissued,theattackermustexecuteaprogramcontainingthatinstruction(s).Iftheattackercannotobtainaccessprivileges,thenthiswillnotbepossible.Trigger#2:TickingTimebomb(TT)Anattackercanbuildacircuittoturnonthebackdoorafterthemachinehasbeenpoweredonforacertainnumberofcycles.TheTTmethodisverysimpletoimplementintermsofhardware;forinstance,asimple40-bitcounterthatincrementsonceperprocessorclockcyclecanbeusedtoopenabackdoorafterroughly18minutesofuptimeat1GHz.UnliketheCCmethod,TTtriggersdonotrequireanyspecialsoftwaretoopenthebackdoor.However,likeCCtriggers,TTtriggerscaneasilyescapedetectionduringdesignvalidationbecauserandomtestsaretypicallynotlongerthanmillionsofcycles.C.BackdoorTypesWhilethespaceofpossibleattacksislimitedonlybytheattacker'screativityandaccesstothedesign,attackscanbebroadlyclassiedintotwocategories,basedontheirruntimecharacteristics.Weobservethatanattackercaneithercreateahardwarebackdoortodomore(orless)workthantheuncompromiseddesignwould,orhe/shecancreateabackdoortodothesameamountofwork(butworkthatisdifferentfromthatofanuncompromisedunit).Bywork,wemeanthemicroarchitecturalsub-operationsorcommunicationsthatmustbecarriedoutfortheexecutionofaninstruction.Thisisacomplete,binaryclassication.EmitterBackdoors(EB)Anemitterbackdoorinami-croarchitecturalunitexplicitlysendsadifferentnumberofmicroarchitecturalcommunicationthananuncompromisedunit.Anexampleofanemitterbackdoorinamemoryunitisonethatsendsoutloadsorstorestoashadowaddress.Whenthistypeofattackistriggered,eachmemoryinstruction,uponaccessingthecachesubunit,sendsouttwoormoremicroarchitecturaltransactionstodownstreammemoryunitsinthehierarchy.Similarattackscanalsobeorchestratedforsouthbridge(I/Ocontrolhub)components,suchasDMAandVGAcontrollers,orotherthirdpartyIP,toexltrate4 condentialdatatounauthorizedlocations.CorrupterBackdoors(CB)Inthistypeofattack,theattackerchangestheresultsofamicroarchitecturaloperationwithoutdirectlychangingthenumberofmicroarchitecturaltransactions.Weconsidertwotypesofcorrupterbackdoors—controlcorruptersanddatacorrupters.Acontrolcorrupterbackdooraltersthetypeorsemanticsofaninstructioninightinawaythatchangesthenumberofmicroarchitecturaltransactionssomewhereelseon-chip(e.g.,atalatercycle).Theseattacksaresimilartoemitterattacks,exceptthatinsteadofsimplyissuinganextrainstruction,theyusesomepartofalegitimateinstructioninordertochangethenumberoftransactionshappeningon-chip.Forexample,ifadecodeunittranslatesanoopinstructionintoastoreinstruction,thiswillindirectlycausethecacheunittodomoreworkthanitwouldinanuntamperedmicroprocessor.However,thischangewillnotmanifestitselfuntilalatercycle.Thisisdifferentfromanemitterattackbecausethedecodeunitdoesnotinsertanynewtransactionsdirectly;itdecodesexactlythesamenumberofinstructionsinthetamperedanduntamperedcase,butthevalueitoutputsinthetamperedcasecausesthecacheunittodomoreworkafewcycleslater.Datacorrupterbackdoorsalteronlythedatabeingusedinmicroarchitecturaltransactions,withoutinanywayalteringthenumberofeventshappeningon-chipduringthelifeoftheinstruction.Examplesofthiscouldincludechangingthevaluebeingwrittentoaregisterleorchangingtheaddressonastorerequest.Forinstance,aninstructionmightbemaliciouslydecodedtoturnanadditionintoasubtraction,causingtheALUtoproduceadifferencevalueinsteadofasumvalue. 4 Emittervs.CorrupterTrade-offsFromtheattacker'spointofview,emitterattacksareeasytoimplement.Emitterattacks,suchasshadowloads,haveverylowareaandlogicrequire-ments.Theyalsohavetheniceproperty(fortheattacker)thatausermaynotseeanysymptomsofhardwareemitterswhenusingapplications.Thisisbecausetheycanpreservetheoriginalinstructionstream.Ofteninpriorworktheterm`backdoor'actuallymeans`emitterbackdoor.'Corrupterattacks,ontheotherhand,aremorecomplicatedtodesignandhardertohidefromtheuser.Infact,acontrolcorrupterattackrequiresstrictlymorelogicthanasimilaremitterattackbecauseratherthensimplysendingatrigger,itmusthidethetriggerwithinaliveinstruction(whichinvolvesextramultiplexingorsomethingequivalent).Intheseattacks,ratherthansimplyemittingbogussignals,theuser'sowninstructionsarealteredtoinvoketheattack.Sincetheuser'sinstructionsarebeingaltered,theattackermusthavesomeknowledgeofthebinariesbeingruntochangethedatawithouttippingofftheuser.Iftheexecutionofthebackdoorcausedtheuser'sprogramtocrash,thiswouldviolatethesecrecyof4Datacorrupterbackdoorscanbeusedtochangeprogramow,forexamplebychangingavalueinaregister,thuschangingtheresultofafuture`branch-if-equal'instruction.However,eachindividualinstructionwillstilldothesameamountofworkasitshould.Theextraworkwillnotoccuruntilthecorruptinstructionhasbeencommitted.Thuseachinstructionconsideredindividuallywillappeartobedoingthecorrectamountofwork.theattack.Corrupterattacksalsoscalepoorlywithdatapathsizes,sincetheyrequiredecodingofuserinstructions.Inthecaseofmulti-stagedecoders,thebackdooritselfmayrequirelatchesandexecuteovermultiplecycles.Tosummarize,the“biggestbangforthebuck”fortheattackerisfromticking-timebomb-emitterattacks.Theycanbeimplementedwithverylittlelogic,arenotdependentonsoftwareorinstructionsequences,andcanruntocompletionunnoticedbyusers.Inthefollowingsection,however,wediscussstrategiesfordefendingagainstalltypesofbackdoorsandtriggers.IV.PRINCIPLESFORMICROPROCESSORPROTECTIONWeproposeasasolutiontotheuntrusteddesignerprob-lemanon-chipmonitoringsystemthatrecognizesmaliciousbehavioratruntime,regardlessofthetriggerorunit.Differ-entattacksrequiredifferentdefenses.Assuch,wepresentoursolutioninfouravors.Werstdescribelowoverheadsolutionsforemitterandcontrolcorrupterprotection,calledTRUSTNETandDATAWATCH.Wethendescribehowaformofpartialduplication,whichwecall`smartduplication'canbeusedagainstsomedatacorrupters.Fordatacorruptersnotprotectedbyanyoftheabovemechanisms,werecommendfullduplication.Forthisinitialstudy,wediscussoursolutionsinthecontextofsimplemicroprocessorsthatdonotre-orderinstructions.A.EmitterBackdoorProtectionEmitterbackdoorsbydenitioncausemore(orless)mi-croarchitecturaltransactionstooccurinthecorruptedunitthantheinstructionspecies.WedesignedtheTRUSTNETmonitoringsystemtowatchthemicroarchitecturaltransactionsineachunitandcatchthisclassofattacks.Conceptually,thesystemdetectsviolationsofdeterministiccommunicationinvariantsbetweenon-chipunits,whichareviolatedbyemitterbackdoors.Towardthisend,wedesignedtheprediction/reactionmon-itortriangle,depictedinFigure 3 .Atriangleconsistsofthreedifferenton-chipunits-apredictor,areactor,andatarget(monitoredunitinFigure 3 ).Thepredictorunitsendsmessagestothemonitor,predictingeventsthatshouldcomeoutofthetargetunit.Ifthereactordoesnotreceiveapredicted !"#$%&'(")*%' +(*%'("#$)*%' ,#-&'(")*%' +(*%'(" ./#01(*%'("02("0-*03*%'&-*04#056-&#$0(*0-*7(*8&/%503*%' 9*53'0#:#*'*('%;&-'%(* 53'0#:#*'*('%;&-'%(* ,#-&'("=0�*703*%'0'/-'"#&#%:#?0(3'53'02"(10'/#01(*%'("#$03*%' !"#$%&'("=0�*703*%'0'/-'?3556%#?0%*53'?0'(0'/#01(*%'("#$03*%' Fig.3.OverviewoftheTRUSTNETandDATAWATCHmonitoringscheme.5 event,orifthereactorreceivesaneventthatwasneverpredicted,thenthemonitorthrowsanalarm.Thetargetunitisanyuniton-chip.Foroneexample,usingFigure 4 asreference,considerthedecodeunit(IDU)asatarget.Thepredictorunitcanbeanyunitthatseeseventsbeforetheyreachthedecodeunit,forexamplethefetchunit(IFU).Thefetchunitseeseveryinstructionbeforeitreachesthedecodeunit.Thereactorunitcanbeanyunitthatseeseventsaftertheypassthroughthetarget.Forexample,itcanbetheexecutionunit(EXU),becausethatunitalwaysreceivesinformationaboutinstructionsaftertheypassthroughthedecodeunit.Themonitoritselfcanresideanywhereon-chip.TheIFUknowseachcyclewhetherornotithasfetchedanewinstruction.TheEXUknowsateachcyclewhetherornotavalidinstructionisadvancingthroughthepipeline.HavingthesetwosourcesofknowledgecorroboratepreventstheIDUfromeverinsertingbogusinstructionsintothepipeline.Nothingextrahastobecomputed,sincethisknowledgeisalreadyrepresentedbysignalsintherespectiveunits.Avitalaspectofthisdesignisthatthetargetunitnevercom-municateswiththemonitor.Therefore,evenifthedesignerofunitXknowsthedesignofthemonitor(whichislikely),thedesignerisunabletocorruptthemonitorofX.Forthisreason,itisimportantthatthemonitorofXshouldnotbephysicallypartofunitX.RecalltheassumptionfromSection III thatonlyonesub-teamiscorrupt,andsoanattackercannotcorrupttwoin-dependentunitson-chip.Thisassumptionguaranteesthatoursystemissecureagainsttheattackspace.ConsiderthemonitorsetuptowatchsomeunitX.Therearefouritemsinplay-thepredictorofX,whichwewillcallP;thereactortoX,whichwewillcallR;Xitself;andthemonitorofX,whichwewillcallM.Theattackermustchooseoneandonlyoneoftheseitemstocorrupt.InordertocauseacondentialityorintegrityattackinX,theattackerwouldhavetochoosetocorruptXitself.However,inthatcaseP,R,andMarealluntamperedwith,andtheattackiscaughtatruntimebytheP-R-Mmonitor.TheattackercouldinsteadtamperwithPorR(butnotboth),butineithercasethatattackwouldbecaughtatruntime,sincePandRwoulddisagreewitheachother.ThelastchoiceistocorruptMtocauseanavailabilityattack.However,sinceMcontainsonlyone(orafew)logicgates,itistoosmalltocontainacheatcodeortickingtimebomb.Mhassofewstates(usuallynone)andinputbits(usuallytwo)thatitcaneasilybeveriedstaticallybyexhaustiveenumeration.B.ControlCorrupterBackdoorProtectionRecallthatcontrolcorrupterschangethesemanticsofanin-structioninight.Thesamepredictor/reactorscheme,usedtocatchemitterattacks,workstodetectcontrolcorrupterattacks,sincetheseattackseithercausereactions(microarchitecturaltransactions)thatarenotpredictedorpreventtransactionsthatarepredicted.Tothisend,wepresentDATAWATCH,thenaturalextensionofTRUSTNET.DATAWATCHusesthesametriangleschemeasTRUSTNET.However,insteadoftransmittingindi-vidualbitstomonitors,signatures(e.g.,hashes,paritybits,TABLEICOMPARISONOFTRUSTNET,DATAWATCH,ANDSMARTDUPLICATIONFORSIMPLE,IN-ORDERMICROPROCESSORS Attribute TRUSTNET DATAWATCH SmartDuplic. DataCorrupterDetection No No Yes ControlCorrupterDetection No Yes Yes EmitterDetection Yes Yes Yes FalsePositives No No No StallsProcessor No No Possibly AreaCost Negligible Low Moderate On-ChipTrafc Increases Off-chipmemory Noeffect Backdoorsource Canbeidentied slices)arepassedtomonitorsthatcheckforequivalence.Forexample,inthecaseofthedecoder,atwobitsignatureisenoughtodenewhetheraninstructioninvolvesaload,astore,both,orneither(twobitsforfourcases).Anequalitycheckonthesignaturesfromthepredictorandthereactorisenoughtoconrmanon-chipinvariant,namelythatthenumberofeachtypeofregularmemoryoperationthatenterstheIDUequalsthenumberofeachtypeofmemoryoperationthatexittheIDU.Thisprevents,forinstance,anattackthatexltratesdatabyconvertingnoopsintostoreinstructions.Datacorrupterattackschangeonlythedatathatissentinon-chipcommunications.Thesebackdoorsarefundamentallydifferentfromthetypespreviouslydiscussedbecausetheamountsandtypesofcommunicationsbetweenunitsduringtheexecutionofaninstructionisidenticaltothatofacorrectlyfunctioningchip.Themonitortriangle,whileveryefcientforrecognizingamountsandtypesoftransactions,doesnotworkwellforthiscase,becausedatacorrupterattackscannotberec-ognizedwithoutduplicatingsomeofthecomputationallogicthathasbeencorrupted.Forexample,iftheEXU(executionunit)producesanincorrectsum,thefactthatthesumiswrongcannotbeknownwithoutduplicating(orotherwiseperformingthejobof)theALU(arithmetic/logicunit).However,thistypeofattackhassomesimilaritieswithtransienterrorsthatcanoccurinmicroprocessors.Signif-icantworkhasbeendonetowardtransienterrordetec-tion[ 25 ][ 55 ][ 71 ][ 23 ]andfaulttolerance,andwedrawontheprinciplesofsomeofthispriorwork.ItissufcientinmanycasestoduplicateselectcomputationallogicinordertoprotecttheRTLdesign,sincestandardmemorystructures(e.g.,RAMs)arenotsusceptibletoRTLlevelattacks.Weproposethatthistypeofminimalduplication,whichwecall`smartduplication,'canbeusedinacase-by-casewaytoprotectanyunits(e.g.,memorycontrolunit)thatarenotcoveredbytheDATAWATCHsystemoranyunitsthatmaybeconsideredvulnerabletodatacorrupterattacks.Thispartialduplicationallowsforprotectionagainstdatacorrupterattacks.However,itdoesthisatthepossiblecostofprocessorstallsandextraarea,andasexplainedpreviously(Sec. III-C ),inmostdomainsdatacorrupterattackswouldlikelybeconsideredinfeasibleduetotherequisiteofknowingthebinariesthatwillberuninthefutureduringtheRTLdesignphase.Therefore,this6 Fig.4.Unitsandcommunicationinthehypotheticalinorderprocessorusedinthisstudy.techniquemayonlybeusefulinafewselectdomainsornotatall.Table I summarizessomeoftheattributesoftheofferedsolutions.Noneoftheproposedsolutionshaveaproblemwithfalsepositives(falsealarms)becausetheyuseinvariantsthatcanbeeasilydeterminedstaticallyinnon-speculative,in-ordermicroprocessors.Extendingthissolutiontodesignswithadvancedspeculativetechniques,suchasprefetching,maymakefalsepositiveavoidancenon-trivial.Falsenegatives(missedattacks)areonlyaproblemifmultiplesignalsintheDATAWATCHtechniquearehashedtosavespace,becausetwodifferentvaluesmayhashtothesamekey,thustrickingtheequalitychecker.However,hashingisanimplementationoption,whichwechosetoavoidbecausethespacerequirementofthebaselineDATAWATCHsystemisfairlylow.C.ACaseStudyTodemonstratetheprinciplesoftheTRUSTNETandDATAWATCHtechniqueswedescribehowtheycanbeappliedtoahypotheticalnon-speculative,in-ordermicroprocessor.Thein-ordermicroprocessorusedinthisstudycloselymod-elsthecoresandcachehierarchyoftheOpenSPARCT2microprocessorwiththeexceptionofthecrossbarnetworkbetweencoreandmemorysystem,thethreadswitchingunit,andthechipsystemunitssuchastheclockandtestunits.Forthisstudy,theunitsintheprocessorcorearepartitionedasdescribedintheOpenSPARCT2documentationandweusedtheopensourceRTLcodetoidentifythepredictorsandreactorsforeachunit.ThefollowingaretheTRUSTNETmonitoringtrianglesweimplemented,categorizedbytheunitbeingmonitored:#1IDU:TheprimaryresponsibilityoftheIDUistodecodeinstructions.PredictedbytheIFUandreactedtobytheEXU,theIDUmonitorconrmseachcyclethatavalidinstructioncomesoutoftheIDUifandonlyifavalidinstructionenteredtheIDU.ThismonitordetectsanyattackwhereintheIDUinsertsspuriousinstructionsintothestream.Inthecaseofbranchandjumpinstructions,whichdonotgoallthewaythroughthepipeline,theinformationtravelsfarenoughfortheEXUtoknowthatabranchorjumpisoccurring.Thismonitorcanbeextendedtosupportaspeculativemicroprocessorifthemonitorcanreliablyidentifyspeculativeinstructions.#2IFU:TheprimaryresponsibilityoftheIFUistofetchinstructions.PredictedbytheI-CacheandreactedtobytheIDU,thismonitorconrmseachcyclethatavalidinstructioncomesoutoftheIFUifandonlyifaninstructionwasfetchedfromtheI-Cache.ThisinvariantcatchesanyattackwhereintheIFUsneaksinstructionsintothestreamthatdidnotcomefromtheI-Cache.Themonitoroperatesonthelevelofsingleinstructionsasopposedtowholecachelines.WhilethewholelineisloadedintotheI-CachefromtheL2,theI-CacheknowswhenindividualinstructionsarebeingfetchedintotheIFU.#3LSU:Theload-storeunit(LSU)handlesmemoryrefer-encesbetweentheSPARCcore,theL1datacacheandtheL2cache.PredictedbytheIDUandreactedtobytheD-Cache,thismonitorconrmseachcyclethatamemoryaction(loadorstore)isrequestedifandonlyifamemoryinstructionwasfedintotheLSU.ThiscatchesshadowloadorshadowstoreattacksintheLSU.Ourmicroprocessoruseswritemerging,whichcouldhavebeenaproblem,sinceseveralincomingwriterequestsaremergedintoasingleoutgoingwriterequest.However,thereisstillasignaleachcyclestatingwhetherornotaload/storeisbeinginitiated,soevenifseveralwritesaremergedoverseveralcycles,thereisstillasignaleachcycleforthemonitoringsystem.#4I-Cache:PredictedbytheIFUandreactedtobytheuniedL2Cache,thisconrmseachcyclethatanL2instructionloadrequestisreceivedintheL2CacheifandonlyifthatloadcorrespondstoafetchthatmissedintheI-Cache.TheIFUcanpredictthisbecauseitreceivesan`invalid'signalfromtheI-Cacheonamiss.AnI-CachemissimmediatelytriggersanL2requestandstallstheIFU,sothereisnoissuewithcachelinesize.TheIFUbuffersthispredictionuntilthereactionisreceivedfromtheL2Cache.ThiscatchesshadowinstructionloadsintheI-Cache.#5D-Cache:PredictedbytheLSUandreactedtobytheL2Cache,thisisthesameasthemonitor#4butwatchesdatarequestsinsteadofinstructionrequests.#6L2Cache:PredictedbytheI-CacheandreactedtobyMMU,thisisthesameasmonitor#4butisonelevelhigherinthecachehierarchy.#7L2Cache:PredictedbytheD-CacheandreactedtobytheMMU,thisisthesameasmonitor#5butisonelevelhigherinthecachehierarchy.#8D-Cache:PredictedbytheLSUandreactedtobytheL2Cache,thisisthesameasmonitor#5butwatcheswritesinsteadofreads.Itisnecessarythattwoseparatemonitorswatchreadsandwrites;ifasinglemonitorcountedonlythetotalnumberofreadsandwrites,thenanattackercouldconvertawriteintoareadunnoticed.Thiswouldcauseolddatatobeloadedintothecacheandpreventthenewvaluefrombeingwritten.#9L2Cache:PredictedbytheD-CacheandI-CacheandreactedtobytheMMU,thisconrmsthatlineaccessesintheMMUcorrespondtolineaccessesissuedbythelevel1caches.Thismonitorpreventsshadowloads/storesexecutedbytheL2Cache.ThefollowingaretheDATAWATCHmonitoringtrianglesweimplemented,categorizedbytheunitbeingmonitored:#10IFU:PredictedbytheIDUandreactedtobytheI-7 !"#$%&'(")*(+#,)-#.&'(")*(+#, /0."1 !"# !"#$%"&$#'()#$% *"+,#-..),/"'$01&,2$0%"&)#$% 2#3)*(+#,)!"#$%&'(")*(+#,45'65')*"+,#3)44,&. Fig.5.TRUSTNETMonitorMicroachitecture.Cache,thisconrmseachcyclethatiftheI-CachereceivesavalidPCvalueitisthesameasthevaluecomputedintheIFU.ThisrequiredsomeduplicationofPClogicbutdidnotrequireanyextrastorage.ThispreventsattackswhereintheIFUmaliciouslychangesthecontrolow.#11DataTLB:PredictedbythecheckerdataTLBandreactedtobytheLSU,thisconrmseachcyclethattheoutputsofthedataTLBmatchtheoutputsofthecheckerdataTLB.ThispreventsdataTLBattacks,suchaspermissionsviolationsorpagemis-translation.Thisisontheborderlineofwhatwewouldstarttocall`smartduplication'becausethe`signatures'aresolarge.However,weincludedthisisourDATAWATCHimplementationandsimulation.#12InstructionTLB:PredictedbythecheckerinstructionTLBandreactedtobytheIFU,thisisthesameasmonitor#11butfortheinstructionTLBratherthanthedataTLB.#13IDU:PredictedbytheIFUandreactedtobytheLSU,thisconrmsthatthenumberofinstructionsdecodedintomemoryoperationsmatchesthenumberofmemoryinstructionsfetched.Forourmicroprocessor,thisrequiredthattheIFUlookedatafewbitsoftheinstruction.Themonitoringoccursataonecyclelag,sothetimingonthecriticalpathisunaffected.TheIFUstoresafewofthebitsfromthefetchedinstructioninip-opsuntilthenextcycle,whenapredictioncanbemadewithafewlogicalgates.Forourcasestudy,thisistheonlytypeofcontrolcorrupterdecoderattackweaddress.Thereasonforthisisthatinoursimplemicroprocessor,theonlytypesofsignalsthedecodercancauseareloadsinstores(if,forexample,thedecoderchangedanaddtoasubtract,thiswouldbeadatacorrupter,becauseitwouldnotalterthenumberoftransactionsintheexecutionunit,justthevalueoftheoutput).Inmorecomplexmicroprocessors,decodeunitsmayberesponsibleformoretypesoftransactionsandmightrequireadditionalmonitoringtriangles.WhencustomizingaDATAWATCHsystemtotaparticulardesign,itisimportantupfronttoidentifywhattypesofsignalseachunitisresponsiblefor.D.MicroarchitectureandOptimizationsThemicroarchitectureofthepredictorandmonitorunitsaredepictedinFigure 5 .Thepredictorunitconsistsof(i)eventbuffersfordelayingtheissueoftokenstothemonitorand(ii)tokenissuelogictodeterminewhenbufferedeventscanbereleasedfromtheeventbuffers.Thepredictorunitrequiresasmallbufferbecauseitispossibleformultiplepredictionstohappenbeforeareactionhappens,andthesepredictionsmustberememberedforthatduration.Thesebufferscanbesizedaprioritoavoidoverows.Themonitoritselfsimplychecksifeventsappearonthepredictorandreactorinputsduringthesamecycle.1)TRUSTNETOptimization:WhendesigningtheTRUST-NETsystemtocatchemitterbackdoors,weconsideredittobeimportantthatthemonitorstsimplyintothepipelinewithoutanycomplextimingorbufferingissues.Sincepredictionsandreactionsmustarriveatthemonitorduringthesamecycle,timingmustbecontrolledinthefaceofnon-determinism,whicharisesinallmicroprocessorsduetocachemisses,etc.Wehandledthisdifferentlyinthecaseofthememoryhierarchyandinthecaseofthepipeline.Thepipelineoffersanaturallock-stepmannerforcoordinatingevents.IfareactionstageisNpipelinestepsdownfromapredictionstage,thenthepredictionstagehasasizeNbufferthatadvancesonlywhenthatstageofthepipelineadvances.Sincethemonitoringnetworkadvancesinlock-stepwithpipelinedevents,timingisnotaproblem.Forexample,ifthethirdpipelinestagewantstosendapredictiontoamonitorthatliesinthefthpipelinestage,thiswilltaketwopipelineadvancements(noneedforforwarding).Ifthethirdstagestallsforanyreason,thepredictionalsostallsandgetsbuffered.Whenthedatafromthethirdstagereachesthefthstage,thepredictiontokenwillalsoarrive.Ofcourse,thepredictiontokenshouldnotpassthroughthefourthstagebutshouldinsteadremaininthepredictionbuffer,withabitdenotingthatitissemanticallyinthefourthstage.Inthecaseofthecachehierarchy,ontheotherhand,itisnecessarytoknowwhichpredictionscorrespondtowhichreactions,becauseitispossibleformemoryrequeststobehandledoutoforder.Thisrequirestime-stampingofpackets,forexamplewithaonebytelocaltimesignaturecopiedfroman8-bitmodularcounter.2)DATAWATCHOptimization:Ana´vesolutionforcatchingcontrolcorrupterbackdoorsinTLBs(translationlookasidebuffers)istosimplyhavetwo(ormore)designersdesignthesameTLBandcomparetheiroutputseachcycle.SinceTLBstendtobepower-hungry,highlyassociativestructures,dupli-cationisnotagoodidea.Insteadofcompleteduplication,weproposeanewTLBmicroarchitecturethatprovidessignicantprotectionwithoutthecostsassociatedwithduplication.TheTLBscontainpagetranslationandpermissionsinformationnotavailableelsewhereonchip.ATLBconsistsofaCAMthattranslatesavirtualpageintoaphysicalpage,whichisthenstoredinatable(RAM)withthecorrespondingpermissionsinformationforthatphysicalpage.8 Thebasicideaofourmethodistocreatea“checker”direct-mappedstructurethathasthesamefunctionalityasaTLB,themotivationbeingthatadirect-mappedstructureusesafractionofthepowerofanassociativeone.TheTLBsinourcasestudyarefullyassociative.WeaddedfunctionalitytotheCAMstooutputthelinenumberoftheoutput.ThisallowedustobuildacheckerTLBthatusestheselinenumbers.Essentially,insteadofhavingoneCAMandadirect-mappedRAM(asisnormal),wehaveoneCAMandtwodirect-mappedRAMsthatoperateinparallel.TheCAMprovidesmatchingentriestobothRAMsinparallel.OneofthoseRAMscommunicateswiththerestofthechipwhiletheotherRAMonlygivesoutputstoamonitor(equalityverier).Theequalitycheckoccursataonecyclelatency,sothevaluesarebufferedforthatcycle.Naturally,theCAMcouldbetamperedwithsothatitsendsincorrectlinenumberstothecheckerTLB.ThiswouldcausetheequalitychecktofailbecausedatafromonelineoftheoriginalTLB'sRAMwillbecomparedtodatafromadifferentlineofthesecondRAM,causinganalarmtobethrown.Therefore,ourcheckerTLBturnsapotentialcondentialityorintegrityattackintoatworstanavailabilityattack.Wenotethatthisavailabilityattackwouldalsobeeasytocatchatvericationtimebecausethepassingofthelinenumberissimple,combinatoriallogicthatcanbecheckedbyexhaustiveenumeration.Whilethisduplicationismuchmoreexpensivethanthesimplermonitorusedforemitterbackdoorprotection,itismuchlessexpensivethancompleteduplicationandoffersstrongprotectionforahighlyvulnerableunit.E.ApplicationsofPriorSolutionsAswementionedbrieyintheintroduction,theproblemofbuildingtrustedsystemsfromuntrustworthycomponentsisaclassicproblemthathasreceivedsomeattentioninthesystemscommunity.AcommonsolutionusedtoamplifytrustincorruptibleprocessesistousetheN-versionmodelofcomputation.ThebasicideaistohaveNentitiesperformthesamecomputationandcomparetheNoutputstocheckforuntrustworthybehavior.Inthissection,weexpandonthedifferentwaysinwhichthisconceptcanbeappliedtomicro-processorsanddiscusstheadvantangesanddisadvantages.Todealwithuntrusteddesignersinthecontextofmi-croprocessors,oneoptionistohaveNdesignerscreateNversionsofeachunitwithinaprocessor,whichwouldallberuncontinuouslytocheckforuntrustworthybehavior.Alternately,onecouldrunaprogramonNdifferentsystemsthatimplementthesameISAbutaremanufacturedbydifferentvendors,say,boardsthathavex86processorsfromAMD,IntelandCentaur.Thelattersuffersfromhighpoweroverheadwhiletheformersuffersfrombothhighdesigncostperchipandhighruntimecosts.AnothersolutionthatavoidsonlytheruntimecostistostaticallyandformallycheckthedesignunitsfromNdesignersforequivalence.Thisapproachincreasesthedesigncostanddoesnotscaletolargedesignsordesignsthatarevastlydifferent.Accordingtothe2007ITRSroadmap,only13.8%ofanormalmicroprocessordesignspecicationisformalizedforveriability[ 2 ].Allcommonsolutionstothisproblemappearunsatisfactoryinthecontextofmicroprocessors.Anotheroptionistousestaticvericationtoidentifybackdoors.TherehasbeenextensivepriorworkonstaticvericationofRTLleveldesigns[ 68 ][ 18 ][ 34 ].Staticveri-cationinvolvesconrmingfunctionalequivalencebetweenabehaviorallevelgoldenmodel(e.g.,aCprogram)andtheRTLleveldesignundertest.Thedifcultyliesinthefactthattheinputspaceforamicroprocessorgrowsexponentiallywiththenumberofinputinterfacesandtheinternalstatesize,whichmakesthefunctionaldomaincatastrophicallylarge.Exhaustivecomparisonisunrealistic,sothestateoftheartistouseprobabilisticapproachesthatattempttoobtainreason-ablecoverage,suchasequivalencechecking[ 30 ][ 68 ],modelchecking[ 30 ],andtheoremproving[ 30 ].Theseapproachescanworkforsmallunits,particularlyoneswithlittleornostate,suchasALUs.Unfortunately,staticvericationisincreasinglybecomingthebottleneckinthemicroprocessordesignprocess[ 30 ]andisbecominglessreliable[ 2 ].Afundamentalweaknessofstaticvericationtechniqueswhenitcomestobackdoordetectionisthattheyattempttouseastationaryweapontohitamovingtarget.Staticmethodschoosespecictargetsforcomparisonorinvariantstoconrmaboutsmallportionsofthedesign.Sinceitisreasonabletoassumethatamaliciousinsiderwouldhavefullknowledgeofthestaticvericationtechniquebeingused,heorshewouldmostlikelydesignthebackdoortoavoidthespacecoveredbythesetechniques.Forexample,heorshewouldlikelymakesurenottoviolateanyofthetheoremsbeingveriedandtoavoidregionsbeingformallycheckedforequivalence.V.EVALUATIONThegoalsofourevaluationwereto:(1)studytheaccuracyandcoverageprovidedbyTRUSTNETandDATAWATCH,(2)measuretheincreasesinon-chipnetworkcongestionfromDATAWATCHrunningonrealprogramsand(3)measuretheareaoverheadsofbothmechanisms.Wedonotdiscussper-formancesincetheproposedmechanismsdonotstallthepipeline,memorysystem,oranyotheron-chipunit,andsecuritypacketstravelonadedicatednetwork.A.ApplicabilityThissectionaddressesthegeneralapplicabilityandlimita-tionsofoursolution,includingrelatedaspectsandpotentialextensions.ScopeofoursolutionOurimplementationofTRUSTNETandDATAWATCHwasdesignedforasimple,in-ordermicro-processor.Whilethemethodologyisapplicabletoanyin-ordermicroprocessor,thisexactimplementationonlyworksforthemicroprocessorinourcasestudy.InordertotTRUSTNETandDATAWATCHtootherdesigns,itisnecessarytoanalyzetheunitsatahighlevelanddeterminewhatthenaturalpredictorsandreactorsare.Infuturework,wehopetodevelopatoolthatautomatesthisprocess.LevelofoursolutionOursolutionisattheRTLleveland9 thuscanonlycatchattacksthatoperateontheRTLlevel.Post-RTL,circuitlevelattacks,suchastamperingwiththevoltagethresholdsoncertaintransistors,wouldnotbecaughtbyoursystem.OursolutioncoversthecoresandthecachehierarchyoftheOpenSPARCT2microprocessorbutdoesnotcoverdebug/testlogicormiscellanies,suchasclockdistribution.Additionally,side-channelattacksarealsonotcovered.MultipleattackersThesolutionweimplementedworksonlyundertheassumptionthatatmostoneofthedesignteamsiscorrupt.Ourdesignisatriangle(completegraphofsizethree).Ifweremovetheassumptionthatonlyoneofthedesignsubteamsiscorruptandallowforndifferentsubteamstobecorruptandfullycoordinated,thenthesolutionmustbecomemorecomplicated.InorderforaTRUSTNETsystemtocatchncoordinatedattackers,itisnecessarytoformacompletegraphofsizen+2.Thepremiseofthesystemisthattwohonestunitsmustcommunicatewitheachothertocompareinformationanddetectdiscrepancies.Ifthereareatmostn+1nodesandnhavebeentamperedwith,thenatmostoneofthemishonest,andtheonehonestnodereceivesonlyinvalidinformation.Therefore,thesizeofthegraphmustbeatleastn+2sothatthereareatleasttwohonestnodes.Ifthegraphisnotcomplete,i.e.itismissinganedge,thenitispossiblethatthemissingedgeconnectstheonlytwohonestnodes.Inthatcase,thetwohonestnodesreceiveonlyinvalidinformation.Therefore,thegraphmustbecomplete.Sincecompletegraphscontainn(n+1) 2bidirectionaledges,theTRUSTNETsolution,whenextendedtonattackersforamicroprocessorwithuunits,hasafundamentalcommunicationoverheadofn(n1)un2u.TheconclusionisthateventhoughTRUSTNETandDATAWATCHaregeneralizableformultiple,coordinatedat-tackingsubteams,theydonotscalewell.Wepresentthisgeneralizedschemeonlyforcompleteness.AlarmsThedecisionofhowtohandleanalarmisdomainspecicandnotafundamentalaspectofourmonitoringsystem.However,wepresentinitialsuggestivethoughtsonwhatmightbedonewithanalarm.Inourexperimentalimplementation,thealarmwassimplyrecordedandnotusedforanycorrectiveactions.Thesimplestresponsetoanalarmistokillthedefectiveprocessor,whichturnsacondentialityorintegrityattackintoanavailabilityattack.Inhighlysecuredomains,thismaybedesirabletoguaranteenoexltrationofsensitivedata.Additionally,inaheterogeneousprocessor(diversity)setting,itmaybedesirabletokillthedefectivecore.WealsonotethatusingtheTRUSTNETandDATAWATCHinfrastructurehasthepotentialtogreatlysimplifythetaskofmakingmicroproces-sorstampercorrective.Ifanalarmissounded,theproblemcanbecorrectedbyrollingbacktothelastcommittedinstruction.Additionally,theinstructionthatwasinightinthecorruptedunitcanbeaggedasacheatcodeandloggedforfutureexecution.Thisapproachwouldbeanalogoustoahoneypot.ExtensionstoGeneralMicroprocessorsThereareseveralTABLEIIEXPERIMENTALINFRASTRUCTURE InstructionSet SunSPARC Microarchitecture Instructionsup-ply 16KB,8-way1R/1WL1Icache,64-entryFAI-TLB(both2-cycleaccess,52cyclesonTLBmiss),Nobranchprediction,stalluntilbranchresolution. Execution Singleissue,1INTALU,T2SPARCregisterwindows. Datasupply 8KB,4-wayL1Dcache1RW,128-entryFADTLB(both3cycleaccess,53cyclesonTLBmiss,write-backpolicy),unied4MB,16-wayL2cache,1RW(both12cycleaccess,write-backpolicy),Unlimitedmainmemoryat250cycleaccesslatency. PipelineStages Fetch,Cache,Pick,Decode,Execute,Read,Bypass,Writeback. Benchmarks bzip2,gcc,mcf,gobmk,hmmer,testinputs,basecompileroptimizations,SPARCcompiler waystogeneralizetheTRUSTNETandDATAWATCHarchitec-ture,andeachwayposeschallengesforfuturework.Themulti-threadedcaseisarelativelysimplegeneralizationthatcanbeimplementedbymakingthepacketsn-wideforann-threadedcore.Assumingonethreadisnotsupposedtoalterthemicroarchitecturaltransactionsofanotherthread,then-widepacketcanfunctionsemanticallyasnindependentmonitors.Theout-of-ordercaseismorecomplicatedasitrequiresourmechanismstobeextendedtohandlereorder-ingofin-ightpredictor/reactortokens.Handlingspeculativetechniqueswouldalsorequireextensions,thoughwebelievethattheprinciplesofoursystemcanbeappliedtoworkinthiscasewithoutanyfalsealarmsbyidentifyingwhatthelifetimeofaninstructionis(whetheritisprefetched,speculatedorcommitted)andmonitoringitforthatlifetime.Thereareotheradvancedfeaturesofmodernmicroprocessors,andeachmaywarrantitsownattentioninfuturework.Forexample,somemicroprocessorshaveaprivilegedorsupervisorstatethatisseparatefromthepermissionsgovernedbytheTLB.Suchadditionswouldopenthedoorforcontrolcorrupterattacksandwouldwarrantadditionalmonitoringtriangles.B.EvaluationMethodologyWedemonstrateourdesignonasimpliedmodelofSunMicrosystems'OpenSPARCT2microarchitecture.Wechosethisarchitectureandinstantiationbecauseitistheonly“industrial-strength”hardwaredesignthatisalsoavailableasopensource.Whileourexperimentsandanalysiswereper-formedonoursimulatedcore,basedontheOpenSPARCT2microprocessordesign,weusenothinguniquetothatdesign,andwebelieveourtechniquescaninprinciplebeappliedtoanymicroprocessorthathasmemoryhierarchyandpipelines.Inourcasestudy,weusedtheRTLhardwareimplementation(1)toconstructwell-formed,meaningfulattackstotesttheresiliencyofthesystemand(2)tosystematicallydeterminethenumberofon-chipunitsthatcanbecoveredbyourdesign.Inaddition,tomeasurecongestion,similartomanycomputerarchitecturestudies,weuseacycle-accuratesimulatorthatexactlymodelsonecoreofourmicroprocessor.Thedetails10 ofoursimulationinfrastructurearesummarizedinTable II .WeimplementedalltheTRUSTNETandDATAWATCHmonitortrianglesdiscussedinthispaper(Tables III , IV )includingthepartiallyduplicatedTLBs.C.AttackSpaceCoverageTodeterminehowgoodTRUSTNETandDATAWATCHareatprotectingagainstattacksonmicroprocessors,werstneedtomeasurethemicroprocessorattack/vulnerabilityspace.Tomeasuretheattack/vulnerabilityspace,weobservethatanon-chipunitisonlyvulnerabletobackdoorsin-so-farasitsinterfacesarethreatened.Whatgoesoninsidetheunitdoesn'tmattersolongaseverythingthatgoesinandoutofitiscorrect.Ifallinputsandoutputsarethesameasinanuncorruptedchip,thenthereisnoproblem,becausetherehasbeennocorruptionorexltrationofdata.Therefore,toidentifythepointsofvulnerability,werecordtheinterfacesbetweenon-chipunits.TheefcacyofoursolutionisthendeterminedbywhetherornottheseinterfacesareprotectedfromattacksusingTRUSTNETandDATAWATCH.Figure 6 (A,B,C,D)showsthedistributionofsharedin-terfacesbetweenunitswithintheoverallchip,theprocessorcore,thememoryelementsandsystemelementsrespectively,intheRTLimplementationoftheOpenSPARCT2.Eachmeasurementinthegridrepresentsthenumberofsignalsthatenterandleaveaunitwithintheprocessor.Allcommunicationisnormalizedtotheunitthathasthehighestcommunication.Verysmallbars(below0.1)signifyconnectionsthatarenotpartofinstructionprocessing,butrathercontainmiscellaneousinformation,suchaspowersettings,clockdistribution,etc(attacksonthesemaybepossibleatfabrication,e.g.,mis-clockingacertainunit,butwouldbediffcultattheRTLcodelevel).Mostofthesemiscellaneoussignalsarenotpresentinourcasestudy.Thus,themicroprocessorinourcasestudyissomewhatsimplerthanarealOpenSPARCT2andlackssomeofthelesscentralfeatures,suchaspowermanagementandoatingpointoperations.AscanbeseenfromTables III and IV ,thatdescribeknownemitterandcorruptermonitorsrespectively,andFigure 6 ,thatdescribesalltheinterfaces,thatalloftheunitsinthecorethathavesignicantcommunications(morethan0.1)aremonitoredbyTRUSTNETandDATAWATCH.Forthisstudy,weusedmanualanalysistoidentifywhichsignalsontheinterfacescanbealteredtocauseemitterandcontrolcorrupterattackstoverifythatTRUSTNETandDATAWATCHcancoverthesesignals.Basedonthisanalysis,mostofthevulnerablein-terfacescanbeprotectedagainsttheknownattacks.However,ourmanualanalysismayhavefailedtoexhaustallpossibleattackscenarios;infuturework,wehopetoautomatethisprocesstoguaranteefullcoverage.WhileTRUSTNETandDATAWATCHcovertheprocessorcoresandthecachehierarchyagainsttheemitterandcontrolcorrupterattacksweidentied,wedidnotcovertheoat-ingpointlogic,memorycontrollerunits,crossbarsandthemiscellaneouslogic.Webelievethatformalvericationcanbeusedtocoveroatingpointlogicasisalreadyinvogue.Similarly,webeleivethatthecrossbarinourcasestudyissimpleenoughtobeamenabletoformalverication.Wearenotsureifourstrategycancoverthemiscellaneouslogicsuchaspowermanagementunitandthestreamprocessingunit(wehavenothadtimetounderstandtheimplementation)buttheseunitshaveverylittlecontactwithotherunitsonchip(lessthan0.1normalized),andwebelieveitisunlikelythattheywillsignicantlyincreaseattackvulnerability.ThedifcultywithcoveringtheMCUwithTRUSTNETisthattheMCUcommunicateswithonlythemainmemoryandthatthereisnothingon-chipthatcanserveasareactor.ThesimplestwaytohandlethisspecialcasewouldbetoduplicatetheMCUcontrollogic,whichservesthesamepurposeascompleteduplicationbutwithoutincreasingmemorybandwidth.D.AttacksandDetectionBasedonourstudyoftheOpenSPARCT2RTL,wecon-structedtheemitterandcontrolcorrupterattacksdescribedinTable V andinjectedtheattacksintooursimulatortomeasurethepercentageoffalsenegativesandfalsepositives.Wedidnotimplementthedatacorrupterattacks.Theseattacks,suchastheonewhichalterstheaddressofanotherwisevalidload,aresituationswheresomelogicduplicationmayberequired.Inthiscase,theaddress(orahashoftheaddress)couldbeforwardedtomakesureithasnotbeenaltered.However,thiswasnotdoneinourimplementation,whichprotectedonlyagainstemitterandcontrolcorrupterattacks.Asstatedearlier,wealsodidnottakeanycorrectiveorrollbackmeasureswithalarms;weonlyrecordedthem.Aswasexpected,allemitterandcontrolcorrupterattackswerecaughtinallcases.Thisisveryimportantbecauseitdemonstratesthatoursystemprovidescoverageforalloftheunitsweappliedittoandforvarioustypesofattacks.Wealsomeasuredtheoverallaccuracyofoursolutionwithnoattacks,asmeasuredbythepercentageofthecyclesinwhichtherearenofalsepositivesthrown.Foralltestsrun,nofalsepositivesoccurred.Itisvitalthattherearenofalsepositivesandnofalsenegativesbecausethelatterwouldbeabreachofsecurityandtheformerwouldcripplethesystem.E.TrafcSinceTRUSTNETandDATAWATCHdonotstallthepipelineorotherwiseincreasecomputationalcycles,themostrelevantcostofthesystemistheincreaseinon-chipnetworktrafc.Thisincreasedependsonthearchitecture,butitcanbeboundedingeneralifweassumeacachehierarchyandoneormorepipelinedcomputationalunits.Thetotalamountoftrafcintheworstcaseisboundedaboveasperthefollowingequation:trafc2(MemoryOpsMemoryMonitorsInstructionsPipelineMonitors)Thefactoroftwocomesfromthefactthateachmonitoringeventconsistsoftwopackets-apredictionandareaction.Thisisalooseupperbound,andweexpectrealprogramstoproducefarlessthanthismuchtrafc.However,thisupper11 TABLEIIIDESCRIPTIONSOFTHEEMITTERPROTECTIONMONITORSFOROURIMPLEMENTATION MonitoredUnit Predictor Reactor Invariant Exampleofattackthwarted IDU IFU EXU #ofinstructionsin=#ofinstructionsout IDUstallsthefetchunitandsendsma-liciouscommandstotheEXU IFU I-Cache IDU #ofinstructionsin=#ofinstructionsout IFUsendsspuriousinstructionstotheIDU LSU IDU D-Cache #ofMemoryopsissued=#ofMemoryopsperformed LSUperformsshadowloads I-Cache IFU L2Cache #ofrequestedL2instructions=#ofIFUrequeststhatmiss I-CachereturnsspuriousinstructionstoIFUwhilewaitingontheL2Cache L2Cache I-Cache MMU #ofrequestedinstructionsfrommemory=#ofI-CacherequeststhatmissinL2 L2Cachereturnsspuriousinstructionswhilewaitingonmainmemory D-Cache LSU L2Cache #ofrequestedL2data=#ofLSUrequeststhatmiss D-CachereturnsfakedatawhilewaitingontheL2Cache L2Cache D-Cache MMU #ofrequesteddatafrommemory=#ofD-CacherequeststhatmissinL2 L2Cachereturnsspuriousdatawhilewaitingonmainmemory D-Cache LSU L2Cache #ofL2cachelineswritten=#ofLSUlinewritesissued D-CachesendswritetoL2cacheun-prompted L2Cache D-Cache MMU #ofMemorylineswritten=#ofD-Cachelinewritesissued L2sendswritetomemoryunprompted Legend:IDU=decodeunit,IFU=fetchunit,LSU=load/storeunit,I-Cache=instructioncache,D-Cache=datacache,L2Cache=uniedL2cache TABLEIVCORRUPTERPROTECTIONMONITORS MonitoredUnit Predictor Reactor Invariant Exampleofattackthwarted Typeofsignature IFU IDU I-Cache PCreceived=PCcomputed IFUbranchesincorrectly Eightbitsignature D-TLB CheckerD-TLB LSU TLBoutput=checkerTLBoutput TLBviolatespermissions Fullpermissionsandtranslation I-TLB CheckerI-TLB IFU TLBoutput=checkerTLBoutput TLBviolatespermissions Fullpermissionsandtranslation IDU IFU LSU Memoryopsissued=memoryopsper-formed Decodercausesshadowload/store Twobitsignature Legend:IFU=fetchunit,IDU=decodeunit,TLB=translationlookasidebuffer,LSU=load/storeunit,I-Cache=instructioncache bounddemonstratesourdesign'sscalability.ThislinearscalingwiththeIPCandthepipelinedepthisoptimal(uptoconstantfactors)giventhatwewanttomonitoreverypipelinestageandeveryinstruction.Weexperimentallymeasuredhowmuchmonitoringnetworktrafcisgeneratedbyrealprogramswithtwoquestionsinmind:(1)Arethereprogramsthatcreateoodsoftrafc(neartheworst-casebound)?(2)Dohigh-leveldifferencesbetweenprogramsaffecttheamountoftrafccausedbyourmonitors?Ourexpectationwasthatthedifferentprogramswouldhavelittleimpactontheamountoftrafcproducedbythemonitors.AsFigure 7 shows,thedifferencesbetweenprogramsdonotsignicantlyimpacttheEPC(eventspercycle)ofoursystem.Figure 7 displaysthenumberofcommunicationspercyclesentbetweenTRUSTNETmonitorsduringexecutionsofSPECintegerbenchmarks.Thesenumbersaredeterministicbecausethemonitorsbehavedeterministicallyandtheinstructionsareinorder.Thetrafcgeneratedisrelativelylow(alwayslessthan2percycle).Itisalsostableacrossthebenchmarks(between1.1and1.2).Thissupportsourbeliefthatasinglemodelworksforallprogramsandthatprogramadaptivefeatureswouldbeunnecessary.Thesenumberswouldbehigherforaprogramthat,forexample,consistedofonlystoreinstructionsoronlybranchinstructions,butwedonotanticipatesuchbehaviorinrealprograms.F.AreaEstimatesInthissection,weprovideboundsonthegeneralareacostofTRUSTNETandDATAWATCHandestimatethecostoftheimplementationinourcasestudy.Weusebytesofstorageasourmetricbecausethecomputationallogicrequiredistrivial(XORs,bufferlogic,orequalitycheckoverafewbits).Theareacostofourmonitorscomesfromthefactthataneventmustbestoredbythemonitoringsystemfromthetimeitreachesthepredictortothetimeitreachesthereactor.Incomplexprocessors,thistimecanbevariable.Itisnecessarytohavebufferslargeenoughtostorealleventsthatarestillincomplete.Thisnumberdependsonthearchitecturebutis Fig.7.EventspercyclecreatedbytheTRUSTNETmonitoringschemeforSPECbenchmarks.Aneventisanycommunicationbetweentwoon-chipunits.Apredictionandareactioncountastwoseparateevents.12 Fig.6.AnoverviewofthecommunicationsthatoccurinarealOpenSPARCT2microprocessor.(A)displaysapartitionofthemicroprocessorintofourbasicparts:'System'includesinterfaces,clockgenerators,andothersystemlevelfeatures.'Memory'cachebanks,non-cacheableunits,andothermemorystructures.Thecorerepresentsoneprocessorcore(thereareeightcoresinall).Thecrossbarcoordinatescommunicationsbetweenthecoresandthecachebanks(whicharepartitionedonchip).(B),(C),and(D)showinternalcommunicationsgoingonwithinthesystem,memory,andcores.knownaprioriforagivenmicroprocessor.Therefore:BufferPacketsMaxMemoryRequestsMaxInstructionsInPipelineInthesingle-issue,inordercase,eachpacketisasinglebit.Additionally,ifthereareNthreadssharingapipeline,thedatamustbeNbitswideinsteadofone,sothatnothread-swappingattacksarepossible.Soingeneral:Area(MaxMemoryRequestsMaxInstructionsInPipeline)PacketSizeSpecically,TRUSTNETasdescribedinTable III ,employsninedifferenttriangles.Itissufcienttouseaonebytepredictionbufferforeachtriangleattheinput(althoughinmostcaseslesswouldsufce).AnalysisofanOpenSPARCT2coreshowsthatitisimpossibleforaonebytepredictionbuffer(eightslots)tooverow.Thismakesatotalofatmostninebytesofstorage.Usingmaximalscalingi.e.,conservativescalingwithnomicroarchitecturaloptimizations,wouldre-quire98=72bytestocoveraneight-threadedOpenSPARCT2core.AnOpenSPARCT2chip,whichcontainseightcores,wouldrequireeightcopiesofTRUSTNETforatotalof728=576bytesofstorage.DATAWATCH,asdescribedinTable IV ,employsfouraddi-tionaltrianglesontopofTRUSTNET.Thetwotrianglesforthepipelineuseeight-widepredictionbuffersofonebytesignatures,foratotalofeightbyteseach.Ifwecreatethetwotrianglesonalleightcores,thatmakes288=128totalbytesofstorage.Includingtheduplicatedirect-mappedTLBs(bothdataandinstruction)addsatotalof128+64=192duplicatedTLBentries.Ifwedothisforeachoftheeightcoresandgiveeachlineagenerous9bytesofstorage,thisadds89192=13824bytesofstorage.ThenDATAWATCHusesatotalof128+13824=13952bytesofstorageontopofTRUSTNET,foratotalof13952+576=14528bytes,oralittleunder15KBofstorage(totalfor8coresandthecachehierarchy).VI.CONCLUSIONOneofthelong-standingclassicproblemsinsystemsse-curityis“Howtobuildtrustworthysystemsfromuntrust-worthycomponents?”Inthispaperwestudyandproposeasolutionforavariantoftheproblem:“Howtobuildtrustworthymicroprocessorsfromuntrustworthycomponentsbuiltbyuntrusteddesigners?”Sinceallsoftwareandhardwareisunderthecontrolofmicroprocessors,establishingtrustinmicroprocessorsisacriticalrequirementforestablishingtrust13 TABLEVSOMEHYPOTHETICALATTACKSONANINORDERMICROARCHITECTURE.THESEATTACKSWERECONCEIVEDBYMANUALANALYSISOFTHEOPENSPARCT2RTL(INSPIREDBY[ 36 ])ANDIMPLEMENTEDINASIMULATORTOTESTOURDESIGNS.THISARRAYOFATTACKSTHREATENSEVERYPIPELINESTAGEASWELLASTHEMEMORYSYSTEM.THESEATTACKSCANVIOLATECONFIDENTIALITY,INTEGRITY,ANDAVAILABILITY.ONLYTHEEMITTERANDCONTROLCORRUPTERATTACKSWEREIMPLEMENTEDINOURCASESTUDY.THEDATACORRUPTERATTACKSAREDISCUSSEDINTHISPAPERANDPROVIDEDHEREFORREFERENCEBUTWERENOTIMPLEMENTED. OpenSPARCUnit Attack PossibleUserLevelEffect BackdoorType Protection IFU Fetchinstructionfromwrongaddress FetchamaliciousprograminsteadoftheonetheOSintends. ControlCorrupter #10 IFU Fetchextrainstructions FetchamaliciousprograminadditiontotheonetheOSintends Emitter #2 IDU Emitspuriousinstructions Emitaspuriousloadorstoretoprivateinformation Emitter #1 IDU Transformno-opintoloadorstore Allowinappropriateloadorstore ControlCorrupter #13 ITLB Translatepagesincorrectly Translateavalidloadintoaloadfromamaliciousprogram ControlCorrupter #12 ITLB ChangeorIgnorepermis-sions Allowloadingfrompageswithoutpermissions ControlCorrupter #12 IL1 Loadswronginstruction FetchamaliciousprograminsteadoftheonetheOSintends DataCorrupter duplic IL1 Loadsextrainstruction FetchamaliciousprograminadditiontotheonetheOSintends Emitter #4 EXU Incorrectoperation ALUproducesincorrectoutput;Widespreaddamage DataCorrupter verif. V-C EXU Incorrectoperation Computewrongaddress DataCorrupter verif. V-C LSU Loads/Storesextradata Load/storeprivateinformation Emitter #3 DL1 Loadsextradata Loadprivateinformation Emitter #5#8 DL1 LoadsfromwronglocationinUL2 Loadprivateinformation DataCorrupter duplic. DL1 Storesextradata Exltrateprivateinformation Emitter #5#8 UL2 Loadsextradata Loadprivateinformation Emitter #6#7#9 UL2 LoadsfromwronglocationinRAM Loadprivateinformation DataCorrupter duplic. UL2 Loads/Storesextradata OverwriteOScriticalinformation Emitter #6#7#9 MC Loads/Storesextradata OverwriteOScriticalinformation Emitter IV-A DTLB Translatesdatalocationin-correctly Translateavalidloadintoaloadofprivateinformation ControlCorrupter #11 DTLB Changepermissions Allowloadingfrompageswithoutpermissions ControlCorrupter #11 DTLB Ignorespermissions Allowloadingfrompageswithoutpermissions ControlCorrupter #11 incomputingbases.WeclassiedthesetofpossibleRTLleveldesignattacksintothreecategoriesandexplainedthetrade-offsbetweeneachofthecategories.WeproposedasasolutiontotheuntrustedmicroprocessordesignerproblemTRUSTNET,adynamicver-icationenginethatcontinuouslymonitorscommunicationstodetectviolationsofdeterministiccommunicationinvariantsbetweenon-chipunits.TRUSTNETkeepstrackofmicroarchi-tecturaleventsrequiredtoexecuteaninstructionandreportsadiscrepancywhenamicroarchitecturalunitdoesmoreorlessworkthanisexpected.Wealsoproposeamorerobustsystem,DATAWATCH,whichwatchesnotonlytheamountofeventsthathappenbutalsothetypeofeventsthathappen.Withinthesetwosystems,eachunitwithinaprocessorismonitoredbytwootherunits,apredictorunitandreactorunit.Thepredictorunitsuppliesinputstotheactorunitandreactorunitreceivesoutputsfromtheactor.Bytrackingpredictionsandreactions,TRUSTNETandDATAWATCHdetectmaliciousmodicationstoachip.TRUSTNETandDATAWATCHarecapableofdetectingma-jorcategoriesofmicroprocessorattackswithoutcompletereplication(aclassictextbooksolutionforsuchproblems)atlowdesigncomplexity,forasmallareainvestment,andwithnoperformanceimpact.BasedonourevaluationoftheOpenSPARCT2RTL,wedeterminedthatTRUSTNETtakesuplessthan1KBofstoragetocatchemitterattacks.WealsodeterminedthatDATAWATCHcanprotectthecoresandthecachehierarchyfromknownemitterandcontrolcorrupterattacksatthecostoflessthan2KBofstorageperprocessorcore.Lastly,wediscussedhowlogicintherestofthedesigncanbeduplicatedinordertoprovidemorerobustcoverageforhighsecuritydomainsatafractionofthecostofcompleteduplication(thecurrentstateofpractice).TheideasbehindTRUSTNETviz.usingthecausalstructureofmicroarchitecturaloperationsinconcertwiththedivisionofworkbetweenprocessorunits,opensupexcitingopportu-nitiestooptimizeovertraditionaltechniquesusedtoimprovereliabilityandavailabilityofmicroprocessors.Forinstance,TRUSTNETandDATAWATCHlikeinfrastructuremaybeusedtodetecttransientfaultsandfordynamicvericationwithouttraditionalduplicationordiversitybasedtechniques.VII.ACKNOWLEDGEMENTSWethankEdwardSuhandanonymousreviewersfortheirdetailedcomments.WealsothankSalStolfoandmembersofthesecurity,architectureandcomputersystemsgroupatColumbiaUniversityforvaluablefeedbackonthiswork.ThisworkwassupportedbyaninstrumentationgrantfromAFOSR.(FA99500910389)14 REFERENCES[1]Intel'sSilverthorneUnveiled:DetailingBabyCentrino. http://www.anandtech.com/showdoc.aspx?i=3230&p=4 .[2]InternationalTechnologyRoadmapforSemiconductors2007Edition:Design.[3]LatestfromDAC:STandMediaTekmanagemediaSoCdesigns(part2). http://www.edn.com/blog/1690000169/post/290028029.html .[4]TrustedComputingGroup.Onlineat https://www.trustedcomputinggroup.org/ ,2007.[5]O.Aciicmez.YetAnotherMicroArchitecturalAttack:Exploit-ingI-cache.InProceedingsofthestComputerSecurityArchitectureWorkshop(CSAW),pages11–18,November2007.[6]O.Aciicmez,S.Gueron,andJ.P.Seifert.NewBranchPredictionVulnerabilitiesinOpenSSLandNecessarySoftwareCountermeasures.CryptologyePrintArchive,Report2007/039,February2007.[7]O.Aciicmez,C.K.Koc,andJ.P.Sefert.OnthePowerofSimpleBranchPredictionAnalysis.InProceedingsoftheACMSymposiumonInformation,ComputerandCommunicationsSecurity(ASIACCS),pages312–320,March2007.[8]O.Aciicmez,C.K.Koc,andJ.P.Seifert.PredictingSecretKeysviaBranchPrediction.InProceedingsoftheRSAConference—CryptographersTrack(CT-RSA),pages225–242,March2007.[9]O.Aciicmez,W.Schindler,andC.K.Koc.CacheBasedRemoteTimingAttackontheAES.InProceedingsoftheRSAConference—CryptographersTrack(CT-RSA),pages271–286,March2007.[10]S.Adee.Thehuntforthekillswitch.IEEESpectrumMagazine,45(5):34–39,2008.[11]G.Agosta,L.Breveglieri,I.Koren,G.Pelosi,andM.Sykora.CountermeasuresAgainstBranchTargetBufferAttacks.InProceedingsofthethWorkshoponFaultDiagnosisandToleranceinCryprography(FDTC),2007.[12]D.Agrawal,S.Baktir,D.Karakoyunlu,P.Rohatgi,andB.Sunar.Trojandetectionusingicngerprinting.InSecurityandPrivacy,2007.SP'07.IEEESymposiumon,pages296–310,May2007.[13]F.AltschulerandB.Zoppis.Embeddedsystemsecurity.January2008.[14]D.P.Appenzeller.Formalvericationofapowerpcmicropro-cessor.InICCD'95:Proceedingsofthe1995InternationalConferenceonComputerDesign,page79,Washington,DC,USA,1995.IEEEComputerSociety.[15]D.AsonovandR.Agrawal.KeyboardAcousticEmanations.InProceedingsoftheIEEESymposiumonSecurity&Privacy,pages3–11,May2004.[16]M.Banga,M.Chandrasekar,L.Fang,andM.S.Hsiao.Guidedtestgenerationforisolationanddetectionofembeddedtrojansinics.InGLSVLSI'08:Proceedingsofthe18thACMGreatLakessymposiumonVLSI,pages363–366,NewYork,NY,USA,2008.ACM.[17]M.BangaandM.Hsiao.Aregionbasedapproachfortheiden-ticationofhardwaretrojans.InHardware-OrientedSecurityandTrust,2008.HOST2008.IEEEInternationalWorkshopon,pages40–47,June2008.[18]J.Baumgartner,H.Mony,V.Paruthi,R.Kanzelman,andG.Janssen.Scalablesequentialequivalencecheckingacrossarbitrarydesigntransformations.InComputerDesign,2006.ICCD2006.InternationalConferenceon,pages259–266,Oct.2006.[19]D.J.Bernstein.Cache-timingAttacksonAES,2005.[20]E.Biham,Y.Carmeli,andA.Shamir.Bugattacks.InCRYPTO,pages221–240,2008.[21]J.BonneauandI.Mironov.Cache-CollisionTimingAttacksagainstAES.InProceedingsofthethInternationalWorkshoponCryptographicHardwareandEmbeddedSystems(CHES),pages201–215,2006.[22]E.Brickell,G.Graunke,M.Neve,andJ.P.Seifert.SoftwareMitigationstoHedgeAESAgainstCache-basedsoftwaresidechannelvulnerabilities.IACRePrintArchive,Report2006/052,February2006.[23]J.Carretero,P.Chaparro,X.Vera,J.Abella,andA.Gonz´alez.End-to-endregisterdata-owcontinuousself-test.SIGARCHComput.Archit.News,37(3):105–115,2009.[24]R.Chakraborty,S.Paul,andS.Bhunia.On-demandtrans-parencyforimprovinghardwaretrojandetectability.InHardware-OrientedSecurityandTrust,2008.HOST2008.IEEEInternationalWorkshopon,pages48–50,June2008.[25]S.Chatterjee,C.Weaver,andT.Austin.Efcientcheckerprocessordesign.InMICRO33:Proceedingsofthe33rdan-nualACM/IEEEinternationalsymposiumonMicroarchitecture,pages87–97,NewYork,NY,USA,2000.ACM.[26]R.P.Colwell.ThePentiumChronicles:ThePeople,Passion,andPoliticsBehindIntel'sLandmarkChips(SoftwareEngi-neering”BestPractices”).Wiley-IEEEComputerSocietyPr,2005.[27]J.Coron.ResistanceagainstDifferentialPowerAnalysisforEllipticCurveCryptosystems.InC.K.KocandC.Paar,editors,ProceedingsofthestCryptographicHardwareandEmbeddedSystems,pages292–302,August1999.[28]J.Dyer,M.Lindemann,R.Perez,R.Sailer,L.vanDoorn,andS.Smith.Buildingtheibm4758securecoprocessor.Computer,34(10):57–66,Oct2001.[29]R.Elbaz,D.Champagne,C.Gebotys,R.B.Lee,N.Potlapally,andL.Torres.Hardwaremechanismsformemoryauthentica-tion:Asurveyofexistingtechniquesandengines.pages1–22,2009.[30]F.Ferrandi,F.Fummi,G.Pravadelli,andD.Sciuto.Identi-cationofdesignerrorsthroughfunctionaltesting.Reliability,IEEETransactionson,52(4):400–412,Dec.2003.[31]K.Gandol,C.Mourtel,andF.Olivier.ElectromagneticAnalysis:ConcreteResults.InProceedingsofrdInternationalWorkshoponCryptographicHardwareandEmbeddedSystems(CHES),pages251–261,2001.[32]B.Gassend,D.Clarke,M.vanDijk,andS.Devadas.Siliconphysicalrandomfunctions.InACMConferenceonComputerandCommunicationsSecurity,pages148–160,NewYork,NY,USA,2002.ACMPress.[33]T.Harada,H.Sasaki,andY.Kami.Investigationonradiatedemissioncharacteristicsofmultilayerprintedcircuitsboards.IEICETransactionsonCommunications,E80-B(11):1645–1651,1997.[34]Y.Huang,R.Guo,W.-T.Cheng,andJ.C.-M.Li.Surveyofscanchaindiagnosis.IEEEDesignandTestofComputers,25(3):240–248,2008.[35]IBM.IBM4764PCI-XCryptographicCoprocessor.[36]S.T.King,J.Tucek,A.Cozzie,C.Grier,W.Jiang,andY.Zhou.DesigningandImplementingMaliciousHardware.InProceedingsofthestUSENIXWorkshoponLarge-scaleExploitsandEmergentThreats,April2008.[37]P.Kocher,J.Jaffe,andB.Jun.Differentialpoweranalysis.pages388–397.Springer-Verlag,1999.[38]O.K¨ommerlingandM.G.Kuhn.DesignPrinciplesforTamper-ResistantSmartcardProcessors.InProceedingsoftheUSENIXWorkshoponSmartcardTechnology,pages9–20,May1999.[39]B.W.Lampson.ANoteontheConnementProblem.Com-municationsoftheACM,16(10),1973.[40]J.W.Lee,D.Lim,B.Gassend,G.E.Suh,M.vanDijk,andS.Devadas.Atechniquetobuildasecretkeyinintegratedcircuitsforidenticationandauthenticationapplication.InProceedingsoftheSymposiumonVLSICircuits,pages176–159,2004.15 [41]J.LiandJ.Lach.At-speeddelaycharacterizationforicauthenticationandtrojanhorsedetection.InHardware-OrientedSecurityandTrust,2008.HOST2008.IEEEInternationalWorkshopon,pages8–14,June2008.[42]S.Mangard.Exploitingradiatedemissions-EMattacksoncryptographicICs.InProceedingsofAustroChip,2003.[43]S.Mangard,E.Oswald,andT.Popp.Poweranalysisattacks:Revealingthesecretsofsmartcards.Springer-Verlag,Secaucus,NJ,USA,2007.[44]V.MarchettiandJ.Marks.TheCIAandtheCultofIntelligence.Knopf,1974.[45]J.Markoff.OldTrickThreatenstheNewestWeapons. http://www.nytimes.com/2009/10/27/science/27trojan.html? r=1/ .[46]G.McFarland.MicroprocessorDesign.McGraw-Hill,Inc.,NewYork,NY,USA,2006.[47]E.D.Mulder,P.Buysschaert,S.B.Ors,P.Delmotte,B.Preneel,G.Vandenbosch,andI.Verbauwhede.ElectromagneticAnal-ysisAttackonanFPGAImplementationofanEllipticCurveCryptosystem.InProceedingsofEUROCON,November2005.[48]M.Neve,J.P.Sefert,andZ.Wang.ARenedLookatBernstein'sAESSide-channelAnalysis.InProceedingsoftheACMSymposiumonInformation,ComputerandCommunica-tionsSecurity(ASIACCS),page369,March2006.[49]M.NeveandJ.P.Seifert.AdvancesonAccess-drivenCacheAttacksonAES.InProceedingsofSelectedAreasofCryptog-raphy(SAC),2006.[50]D.Osvik,A.Shamir,andE.Tromer.CacheattacksandCountermeasures:theCaseofAES.CryptologyePrintArchive,Report2005/271,2005.[51]D.A.Osvik,A.Shamir,andE.Tromer.OtherPeople'sCache:HyperAttacksonHyperThreadedProcessors.Presentationavailableat http://www.wisdom.weizmann.il/tromer/ .[52]C.Percival.CacheMissingforFunandProt. http://www.daemonology.net/papers/htt.pdf .[53]J.J.QuisquaterandD.Samyde.Electromagneticanalysis(EMA):Measuresandcounter-measuresforsmartcards.InProceedingsoftheInternationalConferenceonSmartCards:SmartCardProgrammingandSecurity(E-smart),pages200–210,2001.[54]R.M.Rad,X.Wang,M.Tehranipoor,andJ.Plusquellic.Powersupplysignalcalibrationtechniquesforimprovingdetectionresolutiontohardwaretrojans.InICCAD'08:Proceedingsofthe2008IEEE/ACMInternationalConferenceonComputer-AidedDesign,pages632–639,Piscataway,NJ,USA,2008.IEEEPress.[55]S.K.ReinhardtandS.S.Mukherjee.Transientfaultdetectionviasimultaneousmultithreading.InISCA'00:Proceedingsofthe27thannualinternationalsymposiumonComputerarchi-tecture,pages25–36,NewYork,NY,USA,2000.ACM.[56]K.RosenfeldandR.Karri.Attacksanddefensesforjtag.Design&TestofComputers,IEEE,27(1):36–47,Jan.-Feb.2010.[57]H.Salmani,M.Tehranipoor,andJ.Plusquellic.Newdesignstrategyforimprovinghardwaretrojandetectionandreducingtrojanactivationtime.InHardware-OrientedSecurityandTrust,2009.HOST'09.IEEEInternationalWorkshopon,pages66–73,July2009.[58]H.Saputra,N.Vijaykrishnan,M.Kandemir,M.Irwin,R.Brooks,S.Kim,andW.Zhang.MaskingtheEnergyBehaviorofDESEncryption.InProceedingsoftheDesignAutomationandTestinEuropeConference(DATE),2003.[59]S.R.Sarangi,A.Tiwari,andJ.Torrellas.Phoenix:Detectingandrecoveringfrompermanentprocessordesignbugswithprogrammablehardware.InMICRO39:Proceedingsofthe39thAnnualIEEE/ACMInternationalSymposiumonMicroar-chitecture,pages26–37,Washington,DC,USA,2006.IEEEComputerSociety.[60]A.ShamirandE.Tromer.Acousticcryptanalysis:Onnosypeopleandnoisymachines. http://people.csail.mit.edu/tromer/acoustic/ .[61]S.Smith.Magicboxesandboots:Securityinhardware.IEEEComputer,37(10):106–109,2004.[62]G.E.SuhandS.Devadas.Physicalunclonablefunctionsfordeviceauthenticationandsecretkeygeneration.InDesignAutomationConference,pages9–14,NewYork,NY,USA,2007.ACMPress.[63]K.Tiri,O.Aciicmez,M.Neve,andF.Andersen.AnAnalyticalModelforTime-DrivenCacheAttacks.InProceedingsoftheFastSoftwareEncryptionWorkshop(FSE),March2007.[64]K.TiriandI.Verbauwhede.AVLSIDesignFlowforSecureSide-ChannelAttackResistantICs.InDATE'05:ProceedingsoftheconferenceonDesign,AutomationandTestinEurope,pages58–63,March2005.[65]K.TiriandI.Verbauwhede.DesignMethodforConstantPowerConsumptionofDifferentialLogicCircuits.InProceedingsofDesign,AutomationandTestinEuropeConference(DATE),pages628–633,March2005.[66]K.TiriandI.Verbauwhede.ADigitalDesignFlowforSecureIntegratedCircuits.IEEETransactionsonComputer-AidedDesignofIntegratedCircuitsandSystems(TCAD),25(7):1197–1208,July2006.[67]UnitedStatedDepartmentofDefense.Highperformancemicrochipsupply,February2005.[68]S.Vasudevan,V.Viswanath,J.A.Abraham,andJ.Tu.Sequen-tialequivalencecheckingbetweensystemlevelandrtldescrip-tions.DesignAutomationforEmbeddedSystems,12(4):377–396,2008.[69]I.Verbauwhede,K.Tiri,D.Hwang,A.Hodjat,andP.Schau-mont.CircuitsandDesignTechniquesforSecureICsResistanttoSide-ChannelAttacks.InProceedingsoftheInternationalConferenceonICDesign&Technology(ICICDT),pages1–4,May2006.[70]X.Wang,M.Tehranipoor,andJ.Plusquellic.Detectingmali-ciousinclusionsinsecurehardware:Challengesandsolutions.InHardware-OrientedSecurityandTrust,2008.HOST2008.IEEEInternationalWorkshopon,pages15–19,June2008.[71]J.YooandM.Franklin.Hierarchicalvericationforincreasingperformanceinreliableprocessors.J.Electron.Test.,24(1-3):117–128,2008.[72]M.-D.M.YuandS.Devadas.Secureandrobusterrorcorrectionforphysicalunclonablefunctions.Design&TestofComputers,IEEE,27(1):48–65,Jan.-Feb.2010.16