/
Energy Disaggregation via Discriminative Sparse Coding Energy Disaggregation via Discriminative Sparse Coding

Energy Disaggregation via Discriminative Sparse Coding - PDF document

cheryl-pisano
cheryl-pisano . @cheryl-pisano
Follow
435 views
Uploaded On 2015-05-30

Energy Disaggregation via Discriminative Sparse Coding - PPT Presentation

Zico Kolter Computer Science and Articial Intelligence Laboratory Massachusetts Institute of Technology Cambridge MA 02139 koltercsailmitedu Siddarth Batra Andrew Y Ng Computer Science Department Stanford University Stanford CA 94305 sidbatraang css ID: 77560

Zico Kolter Computer Science

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Energy Disaggregation via Discriminative..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

EnergyDisaggregationviaDiscriminativeSparseCoding J.ZicoKolterComputerScienceandArticialIntelligenceLaboratoryMassachusettsInstituteofTechnologyCambridge,MA02139kolter@csail.mit.eduSiddarthBatra,AndrewY.NgComputerScienceDepartmentStanfordUniversityStanford,CA94305fsidbatra,angg@cs.stanford.eduAbstractEnergydisaggregationisthetaskoftakingawhole-homeenergysignalandsep-aratingitintoitscomponentappliances.Studieshaveshownthathavingdevice-levelenergyinformationcancauseuserstoconservesignicantamountsofen-ergy,butcurrentelectricitymetersonlyreportwhole-homedata.Thus,developingalgorithmicmethodsfordisaggregationpresentsakeytechnicalchallengeintheefforttomaximizeenergyconservation.Inthispaper,weexaminealargescaleenergydisaggregationtask,andapplyanovelextensionofsparsecodingtothisproblem.Inparticular,wedevelopamethod,baseduponstructuredprediction,fordiscriminativelytrainingsparsecodingalgorithmsspecicallytomaximizedisaggregationperformance.Weshowthatthissignicantlyimprovestheperfor-manceofsparsecodingalgorithmsontheenergytaskandillustratehowthesedisaggregationresultscanprovideusefulinformationaboutenergyusage.1IntroductionEnergyissuespresentoneofthelargestchallengesfacingoursociety.Theworldcurrentlyconsumesanaverageof16terawattsofpower,86%ofwhichcomesfromfossilfuels[28];withoutanyefforttocurbenergyconsumptionorusedifferentsourcesofenergy,mostclimatemodelspredictthattheearth'stemperaturewillincreasebyatleast5degreesFahrenheitinthenext90years[1],achangethatcouldcauseecologicaldisastersonaglobalscale.Whilethereareofcoursenumerousfacetstotheenergyproblem,thereisagrowingconsensusthatmanyenergyandsustainabilityproblemsarefundamentallyinformaticsproblems,areaswheremachinelearningcanplayasignicantrole.Thispaperlooksspecicallyatthetaskofenergydisaggregation,aninformaticstaskrelatingtoenergyefciency.Energydisaggregation,alsocallednon-intrusiveloadmonitoring[11],involvestakinganaggregatedenergysignal,forexamplethetotalpowerconsumptionofahouseasreadbyanelectricitymeter,andseparatingitintothedifferentelectricalappliancesbeingused.Numerousstudieshaveshownthatreceivinginformationaboutonesenergyusagecanautomaticallyinduceenergy-conservingbehaviors[6,19],andthesestudiesalsoclearlyindicatethatreceivingappliance-specicinformationleadstomuchlargergainsthanwhole-homedataalone([19]estimatesthatappliance-leveldatacouldreduceconsumptionbyanaverageof12%intheresidentialsector).IntheUnitedStates,electricityconstitutes38%ofallenergyused,andresidentialandcommercialbuildingstogetheruse75%ofthiselectricity[28];thus,this12%gureaccountsforasizableamountofenergythatcouldpotentiallybesaved.However,thewidely-availablesensorsthatprovideelectricityconsumptioninformation,namelytheso-called“SmartMeters”thatarealreadybecomingubiquitous,collectenergyinformationonlyatthewhole-homelevelandataverylowresolution(typicallyeveryhouror15minutes).Thus,energydisaggregationmethodsthatcantakethiswhole-homedataanduseittopredictindividualapplianceusagepresentanalgorithmicchallengewhereadvancescanhaveasignicantimpactonlarge-scaleenergyefciencyissues.1 Energydisaggregationmethodsdohavealonghistoryintheengineeringcommunity,includingsomewhichhaveappliedmachinelearningtechniques—earlyalgorithms[11,26]typicallylookedfor“edges”inpowersignaltoindicatewhetheraknowndevicewasturnedonoroff;laterworkfocusedoncomputingharmonicsofsteady-statepowerorcurrentdrawtodeterminemorecomplexdevicesignatures[16,14,25,2];recently,researchershaveanalyzedthetransientnoiseofanelec-tricalcircuitthatoccurswhenadevicechangesstate[15,21].However,theseandallotherstudiesweareawareofwereeitherconductedinarticiallaboratoryenvironments,containedarelativelysmallnumberofdevices,trainedandtestedonthesamesetofdevicesinahouse,and/orusedcus-tomhardwareforveryhighfrequencyelectricalmonitoringwithanalgorithmicfocuson“eventdetection”(detectingwhendifferentapplianceswereturnedonandoff).Incontrast,inthispaperwefocusondisaggregatingelectricityusinglow-resolution,hourlydataofthetypethatisreadilyavailableviasmartmeters(butwheremostsingle-device“events”arenotapparent);wespecicallylookatthegeneralizationabilityofouralgorithmsfordevicesandhomesunseenattrainingtime;andweconsideradatasetthatissubstantiallylargerthanthosepreviouslyconsidered,with590homes,10,165uniquedevices,andenergyusagespanningatimeperiodofovertwoyears.Thealgorithmicapproachwepresentinthispaperbuildsuponsparsecodingmethodsandrecentworkinsingle-channelsourceseparation[24,23,22].Specically,weuseasparsecodingalgorithmtolearnamodelofeachdevice'spowerconsumptionoveratypicalweek,thencombinetheselearnedmodelstopredictthepowerconsumptionofdifferentdevicesinpreviouslyunseenhomes,usingtheiraggregatesignalalone.Whileenergydisaggregationcannaturallybeformulatedassuchasingle-channelsourceseparationproblem,weknowofnopreviousapplicationofthesemethodstotheenergydisaggregationtask.Indeed,themostcommonapplicationofsuchalgorithmisaudiosignalseparation,whichtypicallyhasveryhightemporalresolution;thus,thelow-resolutionenergydisaggregationtaskweconsiderhereposesanewsetofchallengesforsuchmethods,andexistingapproachesaloneperformquitepoorly.Asasecondmajorcontributionofthepaper,wedevelopanovelapproachfordiscriminativelytrain-ingsparsecodingdictionariesfordisaggregationtasks,andshowthatthissignicantlyimprovesperformanceonourenergydomain.Specically,weformulatethetaskofmaximizingdisaggrega-tionperformanceasastructuredpredictionproblem,whichleadstoasimpleandeffectivealgorithmfordiscriminativelytrainingsuchsparserepresentationfordisaggregationtasks.Thealgorithmissimilarinspirittoanumberofrecentapproachestodiscriminativetrainingofsparserepresentations[12,17,18].However,thesepastworkswereinterestedindiscriminativelytrainingsparsecod-ingrepresentationspecicallyforclassicationtasks,whereaswefocushereondiscriminativelytrainingtherepresentationfordisaggregationtasks,whichnaturallyleadstosubstantiallydifferentalgorithmicapproaches.2DiscriminativeDisaggregationviaSparseCodingWebeginbyreviewingsparsecodingmethodsandtheirapplicationtodisaggregationtasks.Forcon-cretenessweusetheterminologyofourenergydisaggregationdomainthroughoutthisdescription,butthealgorithmscanapplyequallytootherdomains.Formally,assumewearegivenkdiffer-entclasses,whichinoursettingcorrespondstodevicecategoriessuchastelevisions,refrigerators,heaters,etc.Foreveryi=1;:::;k,wehaveamatrix2RTmwhereeachcolumnofcontainsaweekofenergyusage(measuredeveryhour)foraparticularhouseandforthisparticulartypeofdevice.Thus,forexample,thejthcolumnof1,whichwedenotex(j)1,maycontainweeklyenergyconsumptionforarefrigerator(forasingleweekinasinglehouse)andx(j)2couldcontainweeklyenergyconsumptionofaheater(forthissameweekinthesamehouse).WedenotetheaggregatepowerconsumptionoveralldevicetypesasPk=1sothatthejthcolumnof,x(j),containsaweekofaggregatedenergyconsumptionforalldevicesinagivenhouse.Attrainingtime,weassumewehaveaccesstotheindividualdeviceenergyreadings1;:::;k(obtainedforexamplefromplug-levelmonitorsinasmallnumberofinstrumentedhomes).Attesttime,however,weassumethatwehaveaccessonlytotheaggregatesignalofanewsetofdatapoints0(aswouldbereportedbysmartmeter),andthegoalistoseparatethissignalintoitscomponents,01;:::;0k.Thesparsecodingapproachtosourceseparation(e.g.,[24,23]),whichformsforthebasisforourdisaggregationapproach,istotrainseparatemodelsforeachindividualclass,thenusethesemodelstoseparateanaggregatesignal.Formally,sparsecodingmodelstheithdatamatrixusingtheapproximationBwherethecolumnsofB2RTncontainasetofnbasisfunctions,alsocalledthedictionary,andthecolumnsof2Rnmcontaintheactivationsofthesebasisfunctions2 [20].Sparsecodingadditionallyimposesthetheconstraintthattheactivationsbesparse,i.e.,thattheycontainmostlyzeroentries,whichallowsustolearnovercompleterepresentationsofthedata(morebasisfunctionsthanthedimensionalityofthedata).Acommonapproachforachievingthissparsityistoaddan`1regularizationpenaltytotheactivations.Sinceenergyusageisaninherentlynon-negativequantity,weimposethefurtherconstraintthattheactivationsandbasesbenon-negative,anextensionknownasnon-negativesparsecoding[13,7].Specically,inthispaperwewillconsiderthenon-negativesparsecodingobjectiveminA0;B01 2kBk2+Xp;q()pqsubjecttokb(j)k21;j=1;:::;n(1)where,,andBaredenedasabove,2R+isaregularizationparameter,kk(Pp;qYpq)1=2istheFrobeniusnorm,andkyk2(Ppy2p)1=2isthe`2norm.ThisoptimizationproblemisnotjointlyconvexinandB,butitisconvexineachoptimizationvariablewhenholdingtheotherxed,soacommonstrategyforoptimizing(1)istoalternatebetweenminimizingtheobjectiveoverandB.AfterusingtheaboveproceduretondrepresentationsandBforeachoftheclassesi=1;:::;k,wecandisaggregateanewaggregatesignal2RTm0(withoutprovidingthealgorithmitsindividualcomponents),usingthefollowingprocedure(usedby,e.g.,[23],amongstothers).Weconcatenatethebasestoformsinglejointsetofbasisfunctionsandsolvetheoptimizationproblem^1:k=argminA1:k0\r\r\r\r\r\r[B1Bk]241...k35\r\r\r\r\r\r2+Xi;p;q()pqargminA1:k0F(;B1:k;1:k)(2)whereforeaseofnotationweuse1:kasshorthandfor1;:::;k,andweabbreviatetheopti-mizationobjectiveasF(;B1:k;1:k).Wethenpredicttheithcomponentofthesignaltobe^=B^:(3)TheintuitionbehindthisapproachisthatifBistrainedtoreconstructtheithclasswithsmallactivations,thenitshouldbebetteratreconstructingtheithportionoftheaggregatesignal(i.e.,requiresmalleractivations)thanallotherbasesBjforj=i.Wecanevaluatethequalityoftheresultingdisaggregationbywhatwerefertoasthedisaggregationerror,E(1:k;B1:k)kX=11 2kB^k2subjectto^1:k=argminA1:k0F kX=1;B1:k;1:k!;(4)whichquantieshowaccuratelywereconstructeachindividualclasswhenusingtheactivationsobtainedonlyviatheaggregatedsignal.2.1StructuredPredictionforDiscriminativeDisaggregationSparseCodingAnissuewithusingsparsecodingalonefordisaggregationtasksisthatthebasesarenottrainedtominimizethedisaggregationerror.Instead,themethodreliesonthehopethatlearningbasisfunc-tionsforeachclassindividuallywillproducebasesthataredistinctenoughtoalsoproducesmalldisaggregationerror.Furthermore,itisverydifculttooptimizethedisaggregationerrordirectlyoverB1:k,duetothenon-differentiability(anddiscontinuity)oftheargminoperatorwithanon-negativityconstraint.OnecouldimagineanalternatingprocedurewhereweiterativelyoptimizeoverB1:k,ignoringthethedependenceof^1:konB1:k,thenre-solvefortheactivations^1:k;butignoringhow^1:kdependsonB1:klosesmuchoftheproblem'sstructureandthisapproachperformsverypoorlyinpractice.Alternatively,othermethods(thoughinadifferentcontextfromdisaggregation)havebeenproposedthatuseadifferentiableobjectivefunctionandimplicitdiffer-entiationtoexplicitlymodelthederivativeoftheactivationswithrespecttothebasisfunctions[4];however,thisformulationlosessomeofthebenetsofthestandardsparsecodingformulation,andcomputingthesederivativesisacomputationallyexpensiveprocedure.3 Instead,weproposeinthispaperamethodforoptimizingdisaggregationperformancebaseduponstructuredpredictionmethods[27].Todescribeourapproach,werstdenetheregularizeddisag-gregationerror,whichissimplythedisaggregationerrorplusaregularizationpenaltyon^1:k,Ereg(1:k;B1:k)E(1:k;B1:k)+Xi;p;q(^)pq(5)where^isdenedasin(2).Thiscriterionprovidesabetteroptimizationobjectiveforouralgorithm,aswewishtoobtainasparsesetofcoefcientsthatcanachievelowdisaggregationerror.Clearly,thebestpossiblevalueof^forthisobjectivefunctionisgivenby?=argminA01 2kBk2+Xp;q()pq;(6)whichispreciselytheactivationsobtainedafteraniterationofsparsecodingonthedatamatrix.Motivatedbythisfact,therstintuitionofouralgorithmisthatinordertominimizedisaggregationerror,wecandiscriminativelyoptimizethebasesB1:kthatsuchperformingtheoptimization(2)producesactivationsthatareascloseto?1:kaspossible.Ofcourse,changingthebasesB1:ktooptimizethiscriterionwouldalsochangetheresultingoptimalcoefcients?1:k.Thus,thesecondintuitionofourmethodisthatthebasesusedintheoptimization(2)neednotbethesameasthebasesusedtoreconstructthesignals.Wedeneanaugmentedregularizeddisaggregationerrorobjective~Ereg(1:k;B1:k;~B1:k)kX=1 1 2kB^k2+Xp;q(^)pq!subjectto^1:k=argminA1:k0F kX=1;~B1:k;1:k!;(7)wheretheB1:kbases(referredtoasthereconstructionbases)arethesameasthoselearnedfromsparsecodingwhilethe~B1:kbases(refereedtoasthedisaggregationbases)arediscriminativelyoptimizedinordertomove^1:kcloserto?1:k,withoutchangingthesetargets.Discriminativelytrainingthedisaggregationbases~B1:kisnaturallyframedasastructuredpredictiontask:theinputis,themulti-variatedesiredoutputis?1:k,themodelparametersare~B1:k,andthediscriminantfunctionisF(;~B1:k;1:k).1Inotherwords,weseekbases~B1:ksuchthat(ideally)?1:k=argminA1:k0F(;~B1:k;1:k):(8)Whiletherearemanypotentialmethodsforoptimizingsuchapredictiontask,weuseasimplemethodbasedonthestructuredperceptronalgorithm[5].Givensomevalueoftheparameters~B1:k,werstcompute^using(2).Wethenperformtheperceptronupdatewithastepsize ,~B1:k~B1:k r~B1:kF(;~B1:k;?1:k)r~B1:kF(;~B1:k;^1:k)(9)ormoreexplicitly,dening~B=h~B1~Bki,?=h?1T?1TiT(andsimilarlyfor^),~B~B (~B^)^T(~BA?)?T:(10)Tokeep~B1:kinasimilarformtoB1:k,wekeeponlythepositivepartof~B1:kandwere-normalizeeachcolumntohaveunitnorm.Oneitemtonoteisthat,unliketypicalstructuredpredictionwherethediscriminantisalinearfunctionintheparameters(whichguaranteesconvexityoftheproblem),hereourdiscriminantisaquadraticfunctionoftheparameters,andsowenolongerexpecttonecessarilyreachaglobaloptimumofthepredictionproblem;however,sincesparsecodingitselfisanon-convexproblem,thisisnotoverlyconcerningforoursetting.Ourcompletemethodfordiscriminativedisaggregationsparsecoding,whichwecallDDSC,isshowninAlgorithm1. 1Thestructuredpredictiontaskactuallyinvolvesmexamples(wheremisthenumberofcolumnsofX),andthegoalistooutputthedesiredactivations(a?1:k)(j),forthejthexamplex(j).However,sincethefunctionFdecomposesacrossthecolumnsofXandA,theabovenotationisequivalenttothemoreexplicitformulation.4 Algorithm1Discriminativedisaggregationsparsecoding Input:datapointsforeachindividualsource2RTm,i=1;:::;k,regularizationparameter2R+,gradientstepsize 2R+.Sparsecodingpre-training:1.InitializeBandwithpositivevaluesandscalecolumnsofBsuchthatkb(j)k2=1.2.Foreachi=1;:::;k,iterateuntilconvergence:(a)argminA0kBk2+Pp;qpq(b)BargminB0;kb(j)k21kBAk2Discriminativedisaggregationtraining:3.Set?1:k1:k,~B1:kB1:k.4.Iterateuntilconvergence:(a)^1:kargminA1:k0F(;~B1:k;1:k)(b)~Bh~B (~B^)^T(~BA?)(?)Ti+(c)Foralli;j,b(j)b(j)=kb(j)k2.Givenaggregatedtestexamples0:5.^01:kargminA1:k0F(0;~B1:k;1:k)6.Predict^0=B^0. 2.2ExtensionsAlthough,asweshowshortly,thediscriminativetrainingprocedurehasmadethelargestdifferenceintermsofimprovingdisaggregationperformanceinourdomain,anumberofothermodicationstothestandardsparsecodingformulationhavealsoprovenuseful.Sincethesearetypicallytrivialextensionsorwell-knownalgorithms,wementionthemonlybrieyhere.TotalEnergyPriors.Onedeciencyofthesparsecodingframeworkforenergydisaggregationisthattheoptimizationobjectivedoesnottakeintoconsiderationthesizeofanenergysignalfordetermininingwhichclassitbelongsto,justitsshape.Sincetotalenergyusedisobviouslyadis-criminatingfactorfordifferentdevicetypes,weconsideranextensionthatpenalizesthe`2deviationbetweenadeviceanditsmeantotalenergy.Formally,weaugmenttheobjectiveFwiththepenaltyFTEP(;B1:k;1:k)=F(;B1:k;1:k)+TEPkX=1k1T1TBk22(11)where1denotesavectorofonesoftheappropriatesize,and=1 m1Tdenotestheaveragetotalenergyofdeviceclassi.GroupLasso.Sincethedatasetweconsiderexhibitssomeamountofsparsityatthedevicelevel(i.e.,severalexampleshavezeroenergyconsumedbycertaindevicetypes,asthereiseithernosuchdeviceinthehomeoritwasnotbeingmonitored),wealsowouldliketoencourageagroupingeffecttotheactivations.Thatis,wewouldlikeacertaincoefcientbeingactiveforaparticularclasstoencourageothercoefcientstoalsobeactiveinthatclass.Toachievethis,weemploythegroupLassoalgorithm[29],whichaddsan`2normpenaltytotheactivationsofeachdeviceFGL(;B1:k;1:k)=F(;B1:k;1:k)+GLkX=1mXj=1ka(j)k2:(12)ShiftInvariantSparseCoding.Shiftinvariant,orconvolutionalsparsecodingisanextensiontothestandardsparsecodingframeworkwhereeachbasisisconvolvedovertheinputdata,withaseparateactivationforeachshiftposition[3,10].Suchaschememayintuitivelyseemtobebenecialfortheenergydisaggregationtask,whereagivendevicemightexhibitthesameenergysignatureatdifferenttimes.However,aswewillshowinthenextsection,thisextensionactuallyperformworseinourdomain;thisislikelyduetothefactthat,sincewehaveampletrainingdata5 andarelativelylow-dimensionaldomain(eachenergysignalhas168dimensions,24hoursperdaytimes7daysintheweek),thestandardsparsecodingbasesareabletocoverallpossibleshiftpositionsfortypicaldeviceusage.However,pureshiftinvariantbasescannotcaptureinformationaboutwhenintheweekordayeachdeviceistypicallyused,andsuchinformationhasprovencrucialfordisaggregationperformance.2.3ImplementationSpaceconstraintsprecludeafulldiscussionoftheimplementationdetailsofouralgorithms,butforthemostpartwerelyonstandardmethodsforsolvingtheoptimizationproblems.Inparticular,mostofthetimespentbythealgorithminvolvessolvingsparseoptimizationproblemstondtheactivationcoefcients,namelysteps2aand4ainAlgorithm1.Weuseacoordinatedescentapproachhere,bothforthestandardandgroupLassoversionoftheoptimizationproblems,asthesehavebeenrecentlyshowntobeefcientalgorithmsfor`1-typeoptimizationproblems[8,9],andhavetheaddedbenetthatwecanwarm-starttheoptimizationwiththesolutionfrompreviousiterations.TosolvetheoptimizationoverBinstep2b,weusethemultiplicativenon-negativematrixfactorizationupdatefrom[7].3ExperimentalResults3.1ThePlugwiseEnergyDataSetandExperimentalSetupWeconductedthisworkusingadatasetprovidedbyPlugwise,aEuropeanmanufacturerofplug-levelmonitoringdevices.Thedatasetcontainshourlyenergyreadingsfrom10,165differentdevicesin590homes,collectedovermorethantwoyears.Eachdeviceislabeledwithoneof52devicetypes,whichwefurtherreducetotenbroadcategoriesofelectricaldevices:lighting,TV,computer,otherelectronics,kitchenappliances,washingmachineanddryer,refrigeratorandfreezer,dish-washer,heating/cooling,andamiscellaneouscategory.Welookattimeperiodsinblocksofoneweek,andtrytopredicttheindividualdeviceconsumptionoverthisweekgivenonlythewhole-homesignal(sincethedatasetdoesnotcurrentlycontaintruewhole-homeenergyreadings,weapproximatethehome'soverallenergyusagebyaggregatingtheindividualdevices).Crucially,wefocusondisaggregatingdatafromhomesthatareabsentfromthetrainingset(weassigned70%ofthehomestothetrainingset,and30%tothetestset,resultingin17,133totaltrainingweeksand6846testingweeks);thus,weareattemptingtogeneralizeoverthebasiccategoryofdevices,notjustoverdifferentusesofthesamedeviceinasinglehouse.Wetthehyper-parametersofthealgorithms(numberofbasesandregularizationparameters)usinggridsearchovereachparameterindependentlyonacrossvalidationsetconsistingof20%ofthetraininghomes.3.2QualitativeEvaluationoftheDisaggregationAlgorithmsWerstlookqualitativelyattheresultsobtainedbythemethod.Figure1showsthetrueenergyen-ergyconsumedbytwodifferenthousesinthetestsetfortwodifferentweeks,alongwiththeenergyconsumptionpredictedbyouralgorithms.Thegureshowsboththepredictedenergyofseveraldevicesoverthewholeweek,aswellasapiechartthatshowstherelativeenergyconsumptionofdifferentdevicetypesoverthewholeweek(amoreintuitivedisplayofenergyconsumedovertheweek).Inmanycases,certaindevicesliketherefrigerator,washer/dryer,andcomputerarepredictedquiteaccurately,bothintermsthetotalpredictedpercentageandintermsofthesignalsthemselves.Therearealsocaseswherecertaindevicesarenotpredictedwell,suchasunderestimatingtheheat-ingcomponentintheexampleontheleft,andapredictingspikeincomputerusageintheexampleontherightwhenitwasinfactadishwasher.Nonetheless,despitesomepoorpredictionsatthehourlydevicelevel,thebreakdownofelectricconsumptionisstillquiteinformative,determiningtheapproximatepercentageofmanydevicestypesanddemonstratingthepromiseofsuchfeedback.Inadditiontothedisaggregationresultsthemselves,sparsecodingrepresentationsofthedifferentdevicetypesareinterestingintheirownright,astheygiveagoodintuitionabouthowthedifferentdevicesaretypicallyused.Figure2showsagraphicalrepresentationofthelearnedbasisfunctions.Ineachplot,thegrayscaleimageontherightshowsanintensitymapofallbasesfunctionslearnedforthatdevicecategory,whereeachcolumnintheimagecorrespondstoalearnedbasis.Theplotontheleftshowsexamplesofsevenbasisfunctionsforthedifferentdevicetypes.Notice,forexample,thatthebaseslearnedforthewasher/dryerdevicesarenearlyallheavilypeaked,whiletherefrigeratorbasesaremuchlowerinmaximummagnitude.Additionally,inthebasisimagesdeviceslikelightingdemonstrateaclear“band”pattern,indicatingthatthesedevicesarelikelyto6 1 2 3 4 5 6 7 0 1 2 3 Whole Home Actual Energy Predicted Energy 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4Computer 1 2 3 4 5 6 7 0 0.5 1 1.5 2Washer/Dryer 1 2 3 4 5 6 7 0 0.5 1Dishwasher 1 2 3 4 5 6 7 0 0.05 0.1Refrigerator 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4Heating/Cooling True Usage Predicted Usage Lighting TV Computer Electronics Kitchen Appliances Washer/Dryer Dishwasher Refrigerator Heating/Cooling Other 1 2 3 4 5 6 7 0 0.5 1 1.5 2 Whole Home Actual Energy Predicted Energy 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4Computer 1 2 3 4 5 6 7 0 0.5 1 1.5Washer/Dryer 1 2 3 4 5 6 7 0 0.5 1Dishwasher 1 2 3 4 5 6 7 0 0.05 0.1Refrigerator 1 2 3 4 5 6 7 0 0.02 0.04 0.06Heating/Cooling True Usage Predicted Usage Lighting TV Computer Electronics Kitchen Appliances Washer/Dryer Dishwasher Refrigerator Heating/Cooling Other Figure1:Examplepredictedenergyprolesandtotalenergypercentages(bestviewedincolor).Bluelinesshowthetrueenergyusage,andredthepredictedusage,bothinunitsofkWh. 0 0.2 0.4 0.6 0.8 1 Lighting 0 0.2 0.4 0.6 0.8 1 Refridgerator 0 0.2 0.4 0.6 0.8 1 Washer/Dryer Figure2:Examplebasisfunctionslearnedfromthreedevicecategories(bestviewedincolor).Theplotoftheleftshowssevenexamplebases,whiletheimageontherightshowsalllearnedbasisfunctions(onebasispercolumn).beonandoffduringcertaintimesoftheday(eachbasiscoversaweekofenergyusage,sothesevenbandsrepresentthesevendays).Theplotsalsosuggestswhythestandardimplementationofshiftinvarianceisnothelpfulhere.Thereissufcienttrainingdatasuchthat,fordeviceslikewashersanddryers,welearnaseparatebasisforallpossibleshifts.Incontrast,fordeviceslikelighting,wherethetimeofusageisanimportantfactor,simpleshift-invariantbasesmisskeyinformation.3.3QuantitativeEvaluationoftheDisaggregationMethodsThereareanumberofcomponentstothenalalgorithmwehaveproposed,andinthissectionwepresentquantitativeresultsthatevaluatetheperformanceofeachofthesedifferentcomponents.Whilemanyofthealgorithmicelementsimprovethedisaggregationperformance,theresultsinthissectionshowthatthediscriminativetraininginparticulariscrucialforoptimizingdisaggregationperformance.Themostnaturalmetricforevaluatingdisaggregationperformanceisthedisaggrega-tionerrorin(4).However,averagedisaggregationerrorisnotaparticularlyintuitivemetric,andsowealsoevaluateatotal-weekaccuracyofthepredictionsystem,denedformallyasAccuracyPi;qminnPp()pq;Pp(B^)pqo Pp;qp;q:(13)7 Method TrainingSet TestAccuracy Disagg.Err.Acc. Disagg.Err.Acc. PredictMeanEnergy 20.9845.78% 21.7247.41% SISC 20.8441.87% 24.0841.79% SparseCoding 10.5456.96% 18.6948.00% SparseCoding+TEP 11.2755.52% 16.8650.62% SparseCoding+GL 10.5554.98% 17.1846.46% SparseCoding+TEP+GL 9.2458.03% 14.0552.52% DDSC 7.2064.42% 15.5953.70% DDSC+TEP 8.9959.61% 15.6153.23% DDSC+GL 7.5963.09% 14.5852.20% DDSC+TEP+GL 7.9261.64% 13.2055.05% Table1:Disaggregationresultsofalgorithms(TEP=TotalEnergyPrior,GL=GroupLasso,SISC=ShiftInvariantSparseCoding,DDSC=DiscriminativeDisaggregationSparseCoding). 0 20 40 60 80 100 7.5 8 8.5 9 9.5 Training SetDDSC Iteration 0 20 40 60 80 100 0.56 0.58 0.6 0.62 0.64 Disaggregation Error Accuracy 0 20 40 60 80 100 13 13.5 14 14.5 Test SetDDSC Iteration 0 20 40 60 80 100 0.52 0.54 0.56 0.58 Disaggregation Error Accuracy Figure3:EvolutionoftrainingandtestingerrorsforiterationsofthediscriminativeDDSCupdates.Despitethecomplexdenition,thisquantitysimplycapturestheaverageamountofenergypredictedcorrectlyovertheweek(i.e.,theoverlapbetweenthetrueandpredictedenergypiecharts).Table1showsthedisaggregationperformanceobtainedbymanydifferentpredictionmethods.Theadvantageofthediscriminativetrainingprocedureisclear:allthemethodsemployingdiscrimina-tivetrainingperformnearlyaswellorbetterthanallthemethodswithoutdiscriminativetraining;furthermore,thesystemwithalltheextensions,discriminativetraining,atotalenergyprior,andthegroupLasso,outperformsallcompetingmethodsonbothmetrics.Toputtheseaccuraciesincontext,wenotethatseparatetotheresultspresentedherewetrainedanSVM,usingavarietyofhand-engineeredfeatures,toclassifyindividualenergysignalsintotheirdevicecategory,andwereabletoachieveatmost59%classicationaccuracy.Itthereforeseemsunlikelythatwecoulddisaggregateasignaltoabovethisaccuracyandso,informallyspeaking,weexpecttheachievableperformanceonthisparticulardatasettorangebetween47%forthebaselineofpredictingmeanen-ergy(whichinfactisaveryreasonablemethod,asdevicesoftenfollowtheiraverageusagepatterns)and59%fortheindividualclassicationaccuracy.Itisclear,then,thatthediscriminativetrainingiscrucialtoimprovingtheperformanceofthesparsecodingdisaggregationprocedurewithinthisrange,anddoesprovideasignicantimprovementoverthebaseline.Finally,asshowninFigure3,boththetrainingandtestingerrordecreasereliablywithiterationsofDDSC,andwehavefoundthatthisresultholdsforawiderangeofparameterchoicesandstepsizes(though,aswithallgradientmethods,somecarebetakentochooseastepsizethatisnotprohibitivelylarge).4ConclusionEnergydisaggregationisadomainwhereadvancesinmachinelearningcanhaveasignicantimpactonenergyuse.Inthispaperwepresentedanapplicationofsparsecodingalgorithmstothistask,focusingonalargedatasetthatcontainsthetypeoflow-resolutiondatareadilyavailablefromsmartmeters.Wedevelopedthediscriminativedisaggregationsparsecoding(DDSC)algorithm,anoveldiscriminativetrainingprocedure,andshowthatthisalgorithmsignicantlyimprovestheaccuracyofsparsecodingfortheenergydisaggregationtask.AcknowledgmentsThisworkwassupportedbyARPA-E(AdvancedResearchProjectsAgency–Energy)undergrantnumberDE-AR0000018.WeareverygratefultoPlugwiseforprovidinguswiththeirplug-levelenergydataset,andinparticularwethankWillemHouckforhisassistancewiththisdata.WealsothankCarrieArmelandAdrianAlbertforhelpfuldiscussions.8 References[1]D.Archer.GlobalWarming:UnderstandingtheForecast.BlackwellPublishing,2008.[2]M.Berges,E.Goldman,H.S.Matthews,andLSoibelman.Learningsystemsforelectriccomsumptionofbuildings.InASCIInternationalWorkshoponComputinginCivilEngineering,2009.[3]T.BlumensathandM.Davies.Onshift-invariantsparsecoding.LectureNotesinComputerScience,3195(1):1205–1212,2004.[4]D.BradleyandJ.A.Bagnell.Differentiablesparsecoding.InAdvancesinNeuralInformationProcessingSystems,2008.[5]M.Collins.Discriminativetrainingmethodsforhiddenmarkovmodels:Theoryandexperiementswithperceptronalgorithms.InProceedingsoftheConferenceonEmpiricalMethodsinNaturalLanguageProcessing,2002.[6]S.Darby.Theeffectivenessoffeedbackonenergyconsumption.Technicalreport,EnvironmentalChangeInstitute,UniversityofOxford,2006.[7]J.EggertandE.Korner.SparsecodingandNMF.InIEEEInternationalJointConferenceonNeuralNetworks,2004.[8]J.Friedman,T.Hastie,HHoeing,andR.Tibshirani.Pathwisecoordinateoptimization.TheAnnalsofAppliedStatistics,2(1):302–332,2007.[9]J.Friedman,T.Hastie,andR.Tibshirani.Anoteonthegrouplassoandasparsegrouplasso.Technicalreport,StanfordUniversity,2010.[10]R.Grosse,R.Raina,H.Kwong,andA.Y.Ng.Shift-invariantsparsecodingforaudioclassication.InProceedingsoftheConferenceonUncertaintyinArticialIntelligence,2007.[11]G.Hart.Nonintrusiveapplianceloadmonitoring.ProceedingsoftheIEEE,80(12),1992.[12]S.Hasler,H.Wersin,andEKorner.Combingingreconstructionanddiscriminationwithclass-specicsparsecoding.NeuralComputation,19(7):1897–1918,2007.[13]P.O.Hoyer.Non-negativesparsecoding.InIEEEWorkshoponNeuralNetworksforSignalProcessing,2002.[14]C.Laughman,K.Lee,R.Cox,S.Shaw,S.Leeb,L.Norford,andP.Armstrong.Powersignatureanalysis.IEEEPower&EnergyMagazine,2003.[15]C.Laughman,S.Leeb,andLee.Advancednon-intrusivemonitoringofelectricloads.IEEEPowerandEnergy,2003.[16]W.Lee,G.Fung,H.Lam,F.Chan,andM.Lucente.Explorationonloadsignatures.InternationalConferenceonElectricalEngineering(ICEE),2004.[17]J.Mairal,F.Bach,J.Ponce,G.Sapiro,andA.Zisserman.Superviseddictionarylearning.InAdvancesinNeuralInformationProcessingSystems,2008.[18]J.Mairal,M.Leordeanu,F.Bach,M.Hebert,andJ.Ponce.Discriminativesparseimagemodelsforclass-specicedgedetectionandimageinterpretation.InEuropeanConferenceonComputerVision,2008.[19]B.NeenanandJ.Robinson.Residentialelectricityusefeedback:Aresearchsynthesisandeconomicframework.Technicalreport,ElectricPowerResearchInstitute,2009.[20]B.A.OlshausenandD.J.Field.Emergenceofsimple-cellreceptiveeldpropertiesbylearningasparsecodefornaturalimages.Nature,381:607–609,1996.[21]S.N.Patel,T.Robertson,J.A.Kientz,M.S.Reynolds,andG.D.Abowd.Attheickofaswitch:De-tectingandclassifyinguniqueelectricaleventsontheresidentialpowerline.9thinternationalconferenceonUbiquitousComputing(UbiComp2007),2007.[22]S.T.Roweis.Onemicrophonesourceseparation.InAdvancesinNeuralInformationProcessingSystems,2000.[23]M.N.Schmidt,J.Larsen,andF.Hsiao.Windnoisereductionusingnon-negativesparsecoding.InIEEEWorkshoponMachineLearningforSignalProcessing,2007.[24]MN.SchmidtandR.K.Olsson.Single-channelspeechseparationusingsparsenon-negativematrixfactorization.InInternationalConferenceonSpokenLanguageProcessing,2006.[25]S.R.Shaw,C.B.Abler,R.F.Lepard,D.Luo,S.B.Leeb,andL.K.Norford.Instrumentationforhighperformancenonintrusiveelectricalloadmonitoring.ASME,120(224),1998.[26]F.Sultanem.Usingappliancesignaturesformonitoringresidentialloadsatmeterpanellevel.IEEETransactiononPowerDelivery,6(4),1991.[27]B.Taskar,V.Chatalbashev,D.Koller,andC.Guestrin.Learningstructuredpredictionmodels:Alargemarginapproach.InInternationalConferenceonMachineLearning,2005.[28]Various.AnnualEnergyReview2009.U.S.EnergyInformationAdministration,2009.[29]M.YuanandY.Lin.Modelselectionandestimationinregressionwithgroupedvariables.JournaloftheRoyalStatisicalSociety,SeriesB,68(1):49–67,2007.9