Zico Kolter Computer Science and Articial Intelligence Laboratory Massachusetts Institute of Technology Cambridge MA koltercsailmitedu Matthew J Johnson Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge ID: 77561
Download Pdf The PPT/PDF document "REDD A Public Data Set for Energy Disagg..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
REDD:APublicDataSetforEnergyDisaggregationResearchJ.ZicoKolterComputerScienceandArticialIntelligenceLaboratoryMassachusettsInstituteofTechnologyCambridge,MAkolter@csail.mit.eduMatthewJ.JohnsonLaboratoryforInformationandDecisionSystemsMassachusettsInstituteofTechnologyCambridge,MAmattjj@csail.mit.eduABSTRACTEnergyandsustainabilityissuesraisealargenumberofproblemsthatcanbetackledusingapproachesfromdataminingandmachinelearning,buttractionofsuchproblemshasbeenslowduetothelackofpubliclyavailabledata.InthispaperwepresenttheReferenceEnergyDisaggregationDataSet(REDD),afreelyavailabledatasetcontainingde-tailedpowerusageinformationfromseveralhomes,whichisaimedatfurtheringresearchonenergydisaggregation(thetaskofdeterminingthecomponentappliancecontributionsfromanaggregatedelectricitysignal).Wediscusspastap-proachestodisaggregationandhowtheyhavein\ruencedourdesignchoicesincollectingdata,wedescribethehardwareandsoftwaresetupsforthedatacollection,andwepresentinitialbenchmarkdisaggregationresultsusingawell-knownFactorialHiddenMarkovModel(FHMM)technique.1.INTRODUCTIONEnergyandsustainabilityproblemsrepresentoneofthegreatestchallengesfacingsociety.Morethan83%oftheworld'senergycomesfrom(unsustainable)fossilfuels,withrenewableenergyfromwind,solar,geothermalandbiomassmakinguponlyapproximately2%ofthetotal[11].Mean-while,thedemandforenergyisconstantlygrowing:world-wideenergyproductiongrewby46%inthethe20yearsfrom1987to2007[11].Thesimplephysicallimitsofourcurrentenergyresources,aswellastheenvironmentalandclimateimpactofburningmassiveamountsoffossilfuels,makearesearchfocusonissuesofsustainabilityimperative.Furthermore,therearenumerousproblemsinsustainabilitthatarefundamentallydataanalysisandpredictiontasks,areaswheretechniquesfromdataminingandmachinelearn-ingcanproveinvaluable.Despitetheimportanceofsustainabilityresearchandtherelevanceofdataminingandmachinelearningtechniques,therehasbeenrelativelylittleworkintheseareas,atleastcomparedtootherapplicationsareassuchascomputationalPermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.SustKDD2011August2011,SanDiego,CA,USACopyright2011ACM978-1-4503-0840-3...$10.00. \n \r !\r"#$ % &%' ()* Figure1:AnexampleofenergyconsumptionoverthecourseofadayforoneofthehousesinREDD.biologyormachinevision.Wearguethatthissituationisatleastpartlyduetothescarcityofpubliclyavailabledataforsuchdomains.Forexample,althoughtherearevastamountsofdatarelevanttoenergydomains(theenergyconsumptionofeachindividualbuildingandhouseholdinthecountry,theloadingofeachelectricaltransmissionanddistributionline)themajorityofthisdataisunavailabletoresearchers.Fur-thermore,thereissignicantevidencethatpubliclyavail-abledatasetshavespurredpreviousapplicationsareasinmachinelearninganddatamining:biologicalapplicationshavebeenaidedgreatlybythedatasharingmandatesofbiologicaljournalsandgovernmentorganizations[16,12];manyearlysuccessesinnaturallanguageprocessingwerespawnedbythenow-classicWallStreetJournalcorpus[10];andmachinevisionresearchhasbeenaidedgreatlybycom-monbenchmarkdatasestsuchasMNISTdigitrecognition[9],CalTech101[3],andthePASCALchallenge[2].De-spitesomeinitialprogresstowardsthissamegoalforenergandsustainabilitydomains[17],therearecurrentlyfewsuchdatasetsgearedtotheMLanddataminingcommunities.Inthispaper,wepresentourworkondevelopingapublicdatasetofthistype,termedtheReferenceEnergyDisag-gregationDataSet(REDD).Thedataisspecicallygearedtowardsthetaskofenergydisaggregation:determiningthecomponentdevicesfromanaggregatedelectricitysignal.REDDconsistsofwhole-homeandcircuit/devicespecicelectricityconsumptionforanumberofrealhousesoverseveralmonths'time.Foreachmonitoredhouse,werecord(1)thewholehomeelectricitysignal(currentmonitorsonbothphasesofpowerandavoltagemonitorononephase)recordedatahighfrequency(15kHz);(2)upto24individ-ualcircuitsinthehome,eachlabeledwithitscategoryofapplianceorappliances,recordedat0.5Hz;(3)upto20 plug-levelmonitorsinthehome,recordedat1Hz,withafocusonloggingelectronicsdeviceswheremultipledevicesaregroupedtoasinglecircuit.AnexampleofthistypeofdataisshowninFigure1.Asofthetimeofwriting(June15th,2011),wehave10homesmonitored,withatotalof119daysofdata(combinedoverallhomes),268uniquemoni-tors,andmorethan1terabyteofrawdata.Tothebestofourknowledge,REDDrepresentsthelargestpubliclyavail-abledatasetfordisaggregationwiththetrueloadsofeachhouseidentied.Theentiretyofthedataaswellascodeforparsingthedataandrunningbasicalgorithmsispubliclyavailableontheweb:.Whilewepresentsomebasicresultsondisaggregationhere,thefocusofthispaperisthedatasetitself:thedesigndecisionsthatwentintothedatacollection,aswellasthehardwareandsoftwaresystem.Webeginbypresentingabriefoverviewofexistingworkondisaggregationanddis-cusshowthisin\ruencedourchoicesofwhichdatatocollectfromeachhomeandatwhatfrequency.Wethendescribethesoftwareandhardwaresystemswehavebuiltforthistask,anddiscusstheirstrengthsandlimitations.Finally,wepresentbriefresultsonthedata,andhighlightseveraldirectionsforfuturealgorithmicwork.2.ENERGYDISAGGREGATIONEnergydisaggregation,alsoreferredtoasanon-intrusiveloadmonitoring(NILM),1isthetaskofusinganaggregateenergysignal,suchasthatcomingfromawhole-homepowermonitor,tomakeinferencesaboutthedierentindividualloadsofthesystem.Thevalueofthistechnologyisthatinformationaboutindividualappliancesismuchmoreuse-fultoconsumersthansimplytotalelectricityusage;stud-ieshaveshownthatuserfeedbackofthistypecaninducebehaviorchancesthatimproveusereciencyby15%[1,13].Disaggregationtechnologyisalsoseenasanintermedi-atebetweenexistingelectricitymeters(whichmerelyrecordwhole-homepowerusageatsomefrequency)andafullyenergy-awarehomeappliancenetwork,whereeachdevicereportsitsconsumptiontoacentrallocation;anoft-statedgoalofdisaggregationresearchistopushenergyawarenesstoaubiquitouslevel,pavingthewayformoredetaileden-ergymonitoringinthefuture.AcademicworkonenergydisaggregationbeganwiththeworkofHartet.al[6]inthe1980sand1990s.Theini-tialapproacheslookforsharpedges(correspondingtode-viceon/oevents)inboththerealandreactivepowersig-nals,andclusterdevicesaccordingtothesechangesincon-sumption.Laterworkhasexploredanumberofdierentdirections:Usingmorecomplexdevicemodelswithmulti-plestates,integratingfrequencyanalysisandotherfeaturesoftheACwaveforms,andmakinguseofexternalfeaturessuchastimeofdayorweatherconditions.Arecentreviewofnumerousexistingtechniquesforenergydisaggregationcanbefoundin[18].Inthispaper,wehighlightsomeofthekeydistinctionswhichhavecharacterizedpastworkinen-ergydisaggregationandhowtheyhaveinformedourchoicesforREDD.FrequencyofMeasurements.Pastworkhasspannedabroadrangeintermsofthefrequencyofenergymeasure- 1Someauthorsmakesubtledistinctionsbetweenenergydis-aggregationandNILM,butforourpurposeswetreatthesetermsassynonymous.mentsusedfordisaggregation:someworkhasusedaveragepowermeasurementsoverperiodsaslongasanhour[8],whileothershaveanalyzedtheharmonicsofACwaveformsusingMHzresolutions[14,5].2Mostapproachesfallsome-whereinbetweenthesetwoextremes,withmanystudieseitherusingpowerreadingsontheorderofa1HzrateorACcurrentmeasurementsontheorderofseveralkilohertz.Sincehigher-frequencymeasurementscanbesub-sampledtoproducelower-frequencydata,forourpurposesofdatacol-lectionitmakessensetocollectdataatthehighestfrequencpossibleuptothefeasibilityofstoringthedata.Wechose15kHzmonitoring(forthewhole-homedata)asatrade-obetweenthesefactors.Real/ReactivePower.Pastworkhasalsodieredinwhetherthemethodsconsideronlytherealpowersig-nalorboththerealandreactivepowers.3Thisdecisionisconnectedtothepointabove,sincerealandreactivepowerscanbecomputedusingmeasurementsoftheACwaveform,butreactivepowerisacommonenoughquantitytomerititsowndistinction.ForREDD,sincewearecollectingtheACwaveformitself,wecaneasilycomputebothrealandreactivepowers.UseofExternalFeatures.Somepastapproachuseex-ternalfeaturessuchastimeofday,dayofyear,orweatherinformation,whereassomemerelyusethepowersignalit-self.AlldatainREDDisrecordedwithUTCtimestamps,alongwithgeneralgeographicalinformation(onlyuptoacitylevel,forprivacyreasons),sothatitcanbeassociatedwithsuchexternalfeatures.Supervised/UnsupervisedTraining.Mostapproachestoenergydisaggregationhavebeensupervised,inthatthesystemistrainedonindividualdevicepowersignals(orisgivenmanuallyidentieddevicechange-pointsinawhole-homeenergysignal).Alternatively,somerecentworkhasadvocatedunsupervisedapproachesthatconsiderthewholehomesignalwithoutlabeling,andautomaticallyseparatedierentsignals[7].Tofacilitatesupervisedapproachesandtoaidinevaluatingallapproaches,REDDincludesasmuch\supervised"informationaspossible:wemonitoreachin-dividualcircuitinthehome(especiallyimportantforlargeloadsthatcannotbeeasilymonitoredbyaplugload)aswellasmanylargeplugsloadsasisfeasible.Training/TestingGeneralization.Anotherkeydis-tinction(whichhasnotbeengreatlyconsideredinpasten-ergydisaggregationwork)isgeneralizingfromtrainingdatatotestdata.Thevastmajorityofpreviousdisaggregationapproaches(atleastthosewithrigorousquantitativeevalu-ation)havetypicallyevaluatedthealgorithmsonthesamedevices(butindierentconditions)astheyweretrainedon;thatis,theyattempttobuildamodelthatcandis-aggregateagivenapplianceeveninnewconditions,butdonotattempttobuildmodelsthatexplicitlygeneralizeacrossmultipledierentdevicesofthesamecategory.Inourown 2TheworkofPatelelal.issubstantiallydierentfrommostotherapproachestodisaggregation,astheyusehighfrequencymeasurementstolookfortransientsofthevoltagesignalofthehome,andnotnecessarilythecurrent.3Sincethedataminingcommunitymaynotbefamiliarwiththisterminology,brie\ry,realpowercorrespondstothepowerthatisactualconsumedbyanappliance,whereasre-activepowercorrespondstocurrentthat\rowsthroughacircuit,butisputbackintothesystemtypicallyviaanin-ductiveloadintheappliance.AnytextonACpowerwillincludearigorousdescriptionofthesequantities. Figure2:SchematicofthedierentcomponentsoftheREDDhardwareandsoftwaresystem.pastwork[8]wehaveconsideredthischallengeofgeneral-izingacrossmultiplehomes,butthedatausedinthatworkwasonlyavailableatanextremelylowresolution(onehour),andwasnotpermittedtobemadepubliclyavailable,greatlylimitingtheabilityofresearcherstodirectlycomparetotheapproach.Incontrast,agoalofREDDistoconsiderseveraldierenthomes,suchthatworkthatattemptstogeneralizeacrossdevicetypescanberigorouslyevaluatedusingthisdataset.Asexpected,andasweillustrateconcretelyinSection4,generalizationacrosshomesanddevicecategoriesmakesdisaggregationamuchmorechallengingproblem.EvaluationMetrics.Finally,previousworkinpowerdisaggregationhasuseddierentmetricsforevaluatingper-formance:initialworktypicallyfocusedonlyonon/ochangesfordevices,andthenaturalmetrichereiswhethertheal-gorithmcancorrectlyclassifywhichdeviceisturningonorogivenachangepointinthewholehomesignal.Anal-ternativeapproachistolookatthepercentageofenergycorrectlyclassied(theoriginalworkbyHartetal.,[6]con-sideredboththesemetrics).Thelatterhastheadvantagethatitismoregenerallyapplicabletodisaggregationtasks,sinceitdoesnotrelyonextractingedgesintheaggregatepowersignal,andappliestodeviceswithmultiplestatesor\smooth"powerons.Thismetricnaturallyweightshigh-powerdevicesmoreheavilythanlow-powerdevices.Whilewearguethatthisfeatureisoftendesirable,sincetheab-solutepowerconsumedistheultimatequantitywehopetoin\ruence,themetricmayindicategoodperformanceevenwhenlow-powerdevicesareclassiedpoorly,andinsomecasestheselow-powerdevicesarethoseoverwhichtheuserhasgreatestcontrol.Thus,whilewewillconsiderthe\totalenergyproperlyclassied"metricinourexperiments,REDDcanaccommodatemanyperformancemetrics.3.THEREDDHARDWAREANDSOFTWARESYSTEMSWedevelopedtheREDDhardwareandsoftwaresystemswiththeconsiderationsoftheprevioussectioninmind.Thehardwaresystemineachhouselogsdatafromthewhole-homecurrentandvoltage(athigh-frequency)fromeachin- Figure3:EnmetricrouterandPowerPort,designedandbuiltbyEnmetricSystems,Inc. Figure4:TheeMonitor,designedandbuiltbyPow-erhouseDynamics,Inc.dividualcircuitandfromselectedplugs.Thedataisloggedbothlocallyandtocentraldatabase,whichstoresinforma-tionfromallthehousesandcanbeaccessedviaawebin-terface.AschematicofthesystemisshowninFigure2.3.1HardwareSetupForplug-leveldata,weuseawirelessplugmonitoringsystemdevelopedbyEnmetric(http://www.enmetric.com),showninFigure3.Thesystemconsistsofseveralpowerstrips,eachcontainingfourindependentlymonitoredout-lets,andarouterthatconnectstothehome'sinternetcon-nectionviaDHCPandprocessesthereadingfromeachofthewirelessdevices.Energyinformationisthensenttoacentralserveratarateof1Hz.Becausethesystemreportsatasucientrateandisfairlyeasytoinstallinmosthomes,weusethesystemas-isfortheplugleveldatacollection.Circuit-leveldataandwhole-homedatarequireamorein-volvedsetup.Forcircuitleveldata,weagainmakeuseofano-the-shelfsolution:theeMonitor,developedbyPow-erhouseDynamics(http://www.powerhousedynamics.com),showninFigure4.TheeMonitorcomeswithcurrenttrans-formers(CTs)thatattachtoeachindividualcircuitofthehomeinahouse'scircuitbreakerpanel;theversionweusemonitorsupto24circuitsindependently.However,theeMonitorreportspowerconsumptiontoacentralserveratamaximumrateofonceperminute.Sincewearelookingformorefrequentpowerreadings,wedirectlyrequestmeasure-mentsfromthemonitorusingitsAPIatthehighestratepossible(forthecurrenthardware,aboutonereadingforallthecircuitsevery3seconds).Tomeasurewhole-homeACwaveformsathighfrequency,weuseCTsfromaTED(http://www.theenergydetective.com)tomeasurecurrentinthepowermains,aPicoTA041oscillo-scopeprobe(http://www.picotechnologies.com)tomeasurevoltageforoneofthetwophasesinthehome,andaNationalInstrumentsNI-9239analogtodigitalconvertertotransformboththeseanalogsignalstodigitalreadings.ThisA/Dcon-verterhas24bitresolutionwithnoiseofapproximately70V,whichdeterminesthenoiselevelofourcurrentandvolt- Figure5:TheREDDBox,installedinahome(left)andshowinginternals(right).agereadings:theTEDCTsareratedfor200ampcircuitsandamaximumof3volts,soweareabletodierentiatebe-tweencurrentsofapproximately((200))(7010 6)=(3)=4:66mA,correspondingtopowerchangesofabout0:5watts.Similarly,sinceweusea1:100voltagestepdownintheos-cilloscopeprobe,wecandetectvoltagedierencesofabout7mV.Allthedataissenttoalaptop,whichlogsthedataandsendsasubsetoftherawdatatoourcentralserver.Finally,sincethesystemcontainsanumberofelectronicsincloseproximitytothecircuitbreakerbox(theeMoni-tor,A/Dconverter,oscilloscopeproper,computer,externalharddrive,andvariouspowersupplies/cables),wehavebuiltsmallboxes,dubbed\REDDBoxes,"tocontainallthesepartsinasingleunit.ApictureofthecompletesystemasitwouldbeinstallednearacircuitbreakerboxisshowninFigure5.3.2SoftwareSystemThesoftwaresystemonthelaptopineachREDDBoxcon-tainsallthelogictoquerydatafromeachofthemonitors,storethereadingslocally,andsendprocessedinformation(powerfromeachofthemonitorsatamaximumof1Hz)toacentraldatabase.Recallthatwelogtwophasesofcurrentandonephaseofvoltageat15kHz;readingsfromtheA/Dconverterare24-bit,resultingin11GBofdataloggedfromeachhomeperday(whichwecancompress1.5-3Xus-ingbzip2).Itisinfeasibletosendthismuchinformationoverthenetwork,sowelogthedatalocallytoanexternalharddisk,whichwemanuallycollectperiodicallyandcopytotheREDDserver.SincedevelopingnewfeaturesfromACwaveformsisaprincipalresearchdirectionfordisaggre-gationmethods,weincludethisfulldataforresearcherstoanalyzeifdesired,thoughwealsocomputesimplepowerinformationfromthesignal.Inadditiontothesoftwarerunninglocallyateachhome,wehaveacentraldatabasethatstorespowerreadingsfromallthehomes,aswellasawebinterfacethatdisplaysthereal-timestatusofthesystemandallowsuserstoseerecentdatafromthehouses.AviewofthewebinterfaceisshowninFigure6.3.3PrivacyConsiderationsFinally,giventhenatureofthispublicdataset,wewanttobrie\rydiscusstheprivacyconcernsinvolved.Sharingsomeone'sreal-timepowerdatainadditiontotheiridentitispotentiallyquiteharmful:inadditiontosimplybeingabletoestimateprivateinformationsuchastheamountoftimesomeonespendswatchingtelevision,itwouldbequiteeasy Figure6:Livewebinterfacedisplayingpowercon-sumptionatthecircuitlevelinahome. !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !!! !!! !!! !!! !!! ! ! ! Figure7:GraphicalmodelrepresentationoftheFactorialHiddenMarkovModel(FHMM).todetermineifsomeonewashomeornotbasedupontheirpowerusage.Forthesereasons,we(1)storenoidentifyinginformationaboutthehousesinthedatabase,anddiscloseonlythattheyareinthegreaterBostonarea,and(2)releaseonlyhistoricaldata,andkeeptheliveportionofthewebsiteforprivateusealone.(Allparticipantsinthestudyaremadeawareofthesestipulations.)Althoughprivacyconcernsarestillanissuethatrequiresconstantmonitoring,ourhopeisthatthesesafeguardsgreatlydecreasetheriskofdisclosingoridentifyingpersonalinformationfromthedata.4.EXPERIMENTALRESULTSHerewepresentexamplesofsimplealgorithmsappliedtoREDD.Thegoalofthissectionisnottopresentstate-of-the-artperformanceresults,butrathertodemonstratetheperformanceofawell-studiedalgorithmforthistaskandhighlightsomeofthechallengesforfuturework.WefocusontheFactorialHiddenMarkovModel(FHMM)[4],whichhasbeenconsideredrecentlyasamethodfordis-aggregation[7].IntheFHMM,eachofthendevices(orcir-cuits)inthehomeisdescribedviaaHiddenMarkovModel(HMM).Eachdevicehasadiscretehiddenstate,denotedx(i)t2f1;:::;Nigforthestateattimetfordevicei,whichcorrespondsroughlytotheinternalstateofthedevice(\o",orinoneofseveralpossible\on"states).Ateachtimet,giventheinternalstate,theithdeviceemitsaGaussian-distributedpower,denotedy(i)t,withstate-specicmeanandvarianceparameters.However,weonlyobservethesumofallthepoweroutputsateachtime,ytPni=1y(i)t.Thedisaggregationtaskcanthenbeframedasaninfer-enceproblem:givenanobservedsequenceofaggregateen- House Monitors DeviceCategories 1 20 Electronics,Lighting,Refrigera-tor,Disposal,Dishwasher,Furnace,WasherDryer,SmokeAlarms,BathroomGFI,KitchenOutlets,Microwave 2 19 Lighting,Refrigerator,Dishwasher,WasherDryer,BathroomGFI,KitchenOutlets,Oven,Microwave,ElectricHeat,Stove 3 24 Electronics,Lighting,Refrigera-tor,Disposal,Dishwasher,Furnace,WasherDryer,BathroomGFI,KitchenOutlets,Microwave,ElectricHeat,OutdoorOutlets 4 19 Lighting,Dishwasher,Furnace,WasherDryer,SmokeAlarms,Bath-roomGFI,KitchenOutlets,Stove,Disposal,AirConditioning 5 10 Lighting,Refrigerator,Disposal,Dish-washer,WasherDryer,KitchenOut-lets,Microwave,Stove Table1:Descriptionofthehousesanddevicesusedintheevaluation.ergyy1;:::;yT,weaimtocomputetheposteriorprobabil-ityoftheindividualdeviceconsumptionsy(i)t,i=1;:::;n,t=1;:::;T.AgraphicalmodeldepictingrepresentingthisFHMMisshowninFigure7.AlthoughtrainingandinferenceinanFHMMarenontriv-ial,thealgorithmsaredescribedindetailinotherwork,andsoweonlydiscussthembrie\ryhereandincludecodeforthealgorithmintheREDDrelease.Tobuildthemodelfromdataweusetheindividualapplianceenergysequences,ascollectedbytheindividualdevicemonitors,andtrainHMMsusingthestandardBaum-Welch(EM)algorithm(thus,thealgorithmwearedescribingfallsunderthe\supervised"des-ignationofSection2).ExactposteriorinferenceintheFHMMmodelisnottractable(weuse4statesperdevice,andtypicallyaround20devicesperhomeforatotalof42011012dierentcombinationsofhiddenstates),soweuseablockedGibbssamplingscheme:wexthehiddenstatesofallbutoneofthechains,resultinginaGaussianposteriorovertheemissionsovertheremainingchain;atthispoint,wecanecientlysampleoverhiddenstatesfortheheld-outchain,andrepeattheprocessuntilthedistri-butionoverallhiddenstatesmixes.(Wealsoannealthesamplingprocedurebyarticiallyin\ratingthevarianceoftheobservedaggregateoutputsduringtheearlyiterationsofGibbssampling.)Toevaluatethemethod,weused2weeksofdatafrom5ofthehousesinREDD;sincetheplug-levelmonitorshadnotyetcollectedsucientdataatthetimeofwriting,weusethewhole-homeandcircuitleveldata.AdescriptionofthedevicesineachofthesehomesisgiveninTable1.Inthepresentedexperimentswesub-sampledthedatato10secondintervalsusingamedianlter.Toevaluatetheperformanceofthemethod,weusedthe\totalenergycorrectlyassigned"metricdescribedinSection2,denedformallyasAcc=1PTt=1Pni=1^y(i)ty(i)t 2PTt=1yt(1)where^y(i)tdenotesthealgorithm'spredictionfortheithde- House FHMM SimpleMean Train Test Train Test 1 71.5% 46.6% 41.4% 21.5% 2 59.6% 50.8% 39.0% 36.7% 3 59.6% 33.3% 46.7% 18.8% 4 69.0% 52.0% 52.7% 32.5% 6 62.9% 55.7% 33.7% 19.8% Total 64.5% 47.7% 42.7% 25.9% Table2:Percentageoftotalenergyclassiedcor-rectlyfordierenthouses,usingFHMMdisaggre-gationandasimplemodelthatpredictsthedevice'saverageconsumptionpercentageateachtime. \n \r \n Figure8:PiechartshowingpredictedandactualenergyconsumedforHouse5(whentheFHMMistrainedonlyonhouses1-4),averagedoverthecourseoftwoweeks.viceatthetthtimestep,andwherethe2factorinthedenominatorcomesfromthethatthattheabsolutevaluewill\doublecount"errors,sincePni=1y(i)tPni=1^y(i)t.Table2showsthedisaggregationperformanceoftheFHMMmodelonthevehousesweconsider.Wefocusontwotest-ingprocedures:intherstcasewebuildHMMmodelsfromdevicesinagivenhouse,andthenattempttodisaggregateenergyinthathouse;inthesecondcase,wetrainonfourofthehousesandtestontheremainingheld-outhouse.Thisprocedureisanalogousto\training"versus\testing"error,andwelabeltheresultsaccordinglyinTable2.Forcom-parisonwealsoshowtheperformanceofasimplemeanpre-dictionalgorithm,whichestimatesthetotalpercentagethateachdevicetypeconsumesandpredictsthatthetotalenergybreaksdownaccordingtothispercentageatalltimes.AsseeninTable2,theFHMMisabletodisaggregatethepowerdatareasonablywell;asexpected,thereisasignif-icantdropinaccuracywhenmovingfromtrainingpredic-tiontotestprediction,buttheFHMMmethodstillworkssubstantiallybetterthansimplemeanprediction.Althoughaverageaccuraciesofaround50%mayseemlow,weempha-sizethatthisisforthecaseofpredictingapreviouslyunseensetofdevices,andthismetricmeasuresthepercentageoftotalenergycorrectlyclassiedateach10secondinterval.Ifweaggregatethepredictionsoveralongertimehorizon,thenerrorstendto"cancelout,"andweoftenobtainmuchhigheraccuracy.Figure8,forexample,showsthetotaltrueandpredictedenergyforhouse5(trainingonlyonhouses1-5),summedovertwoweeks.Atthislevelofaggregation,themethodclassies82%oftheenergycorrectly,andsuch\aggregated"chartshavesignicantvalueforuserfeedback.5.CONCLUSION \n\r\r \n Figure9:PredictedversusactualenergysignalfortherefrigeratorinHouse5.ThispaperhasintroducedREDD,adatasetforresearchenergydisaggregation.Energydisaggregationisanalgo-rithmicchallengewhereadvancescanhavearealimpactonenergyeciencyandsustainability.Wehavedescribedthehardwareandsoftwaresetupanddemonstratedastandardalgorithm,theFHMM,forthedisaggregationtask.OurultimategoalindevelopingREDD,however,istoprovideaneasily-accessibledatasetforresearcherswork-ingindataminingormachinelearning.Thus,wehighlightthefactthatwhileFHMMsperformedreasonablywellintheexperimentswepresented,thereisalsomuchroomforimprovement.Forexample,Figure9showsanactualandpredictedsignalfortherefrigeratorinHouse5;althoughtheFHMMsometimesextractsthesignalcorrectly,italsooftenfailstodetecttherefrigeratororestimatesanoisyandun-relatedsignal.Manymodications,suchincludingexplicitdurationsviaanHSMM[15],incorporatinghardconstraintsondevicesignals,orlookingatmorecomplexfeaturesofthepowersignalcanallhelptoimprovethisperformance.Ofparticularinteresttousishowsuchtechniquescouldwillex-tendtogeneralizeacrossdierentdevicesinmultiplehomes.Wearealsoexcitedbytheprospectofsemi-supervisedtech-niquesfordisaggregation;whileREDDaimstobealargeresource,wecanonlyouttsomanyhomeswithsuchde-tailedsensing,andagreatchallengethatremainsistodis-coverwaystomergethistypeofhigh-delitymeasurementwiththemassiveamountsof(unlabeled)smartmeterdatathatutilitycompaniescurrentlygenerate.OurhopeisthattheavailabilityofadatasetsuchasREDDcanfurthermo-tivatethemachinelearninganddataminingcommunitiestotacklethisproblem.AcknowledgmentsThisworkwassupportedbyARPA-E(AdvancedResearchProjectsAgency-Energy)undergrantnumberDE-AR0000018.J.ZicoKolterissupportedbyaNationalScienceFounda-tionComputingInnovationFellowship.MattJ.JohnsonissupportedbyanNationalScienceFoundationGraduateResearchFellowship.WethankCarrieArmelandMarioBergesforhelpfuldiscussions.WethankEnmetricSystemsandPowerhouseDynamicsfortheirassistanceinusingtheirhardwareforthisproject.6.REFERENCES[1]S.Darby.Theeectivenessoffeedbackonenergyconsumption.Technicalreport,EnvironmentalChangeInstitute,UniversityofOxford,2006.[2]M.Everingham,L.V.Gool,C.Williams,J.Winn,andA.Zisserman.ThePASCALvisualobjectclasses(VOC)challenge.InternationalJournalfoComputerVision,88(2):303{338,2010.[3]L.Fei-Fei,R.Fergus,andP.Perona.Learninggenerativevisualmodelsfromfewtrainingexamples:AnincrementalBayesianapproachtestedon101objectcategories.ComputerVisionandImageUnd.,106(1):59{70,2007.[4]Z.GhahramaniandM.I.Jordan.Factorialhiddenmarkovmodels.MachineLearning,29(2{3):245{273,1997.[5]S.Gupta,S.Reynolds,andS.N.Patel.ElectriSense:Single-pointsensingusingEMIforelectricaleventdetectionandclassicationinthehome.InProceedingsoftheConferenceonUbiquitousComputing,2010.[6]G.Hart.Nonintrusiveapplianceloadmonitoring.ProceedingsoftheIEEE,80(12),1992.[7]H.Kim,M.Marwah,M.Arlitt,G.Lyon,andJ.Han.Unsuperviseddisaggregationoflowfrequencypowermeasurements.InProceedingsoftheSIAMConferenceonDataMining,2011.[8]J.Z.Kolter,S.Batra,andA.Y.Ng.Energydisaggregationviadiscriminativesparsecoding.InNeuralInformationProcessingSystems,2010.[9]Y.LeCun,L.Bottou,Y.Bengio,andP.Haner.Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11):2278{2324,1998.[10]M.P.Marcus,M.A.Marcinkiewicz,andB.Santorini.BuildingalargeannotatedcorpusofEnglish:thePenntreebank.ComputationalLinguistics:Specialissueonusinglargecorpora:II,19(2),1993.[11]Multiple.Annualenergyreview2009.Technicalreport,U.S.EnergyInformationAdministration,2009.[12]NationalInstituteofHealth.NIHdatasharingpolicyandimplementationguidance.http://grants.nih.gov/grants/policy/data sharing/data sharing guidance.htm,2003.[13]B.NeenanandJ.Robinson.Residentialelectricityusefeedback:Aresearchsynthesisandeconomicframework.Technicalreport,ElectricPowerResearchInstitute,2009.[14]S.N.Patel,T.Robertson,J.A.Kientz,M.S.Reynods,andG.Abowd.Atthe\rickofaswitch:detectingandclassifyinguniqueelectricaleventsontheresidentialpowerline.InProceedingsoftheConferenceonUbiquitousComputing,2006.[15]L.Rabiner.Atutorialonhmmandselectedapplicationsinspeechrecognition.ProceedingsoftheIEEE,77(2):257{286,February1989.[16]ScienceMagazine.Sciencemagazine:Generalpolicies.http://www.sciencemag.org/site/help/authors/policies.xhtml,2011.[17]U.S.DepartmentofEnergy.Openenergyinfo.http://www.openei.org,2011.[18]M.ZiefmanandK.Roth.Nonintrusiveapplianceloadmonitoring:Reviewandoutlook.IEEETransactionsonConsumerElectronics,57(1):76{84,2011.