/
REDD A Public Data Set for Energy Disaggregation Resea REDD A Public Data Set for Energy Disaggregation Resea

REDD A Public Data Set for Energy Disaggregation Resea - PDF document

tawny-fly
tawny-fly . @tawny-fly
Follow
469 views
Uploaded On 2015-05-30

REDD A Public Data Set for Energy Disaggregation Resea - PPT Presentation

Zico Kolter Computer Science and Articial Intelligence Laboratory Massachusetts Institute of Technology Cambridge MA koltercsailmitedu Matthew J Johnson Laboratory for Information and Decision Systems Massachusetts Institute of Technology Cambridge ID: 77561

Zico Kolter Computer Science

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "REDD A Public Data Set for Energy Disagg..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

REDD:APublicDataSetforEnergyDisaggregationResearchJ.ZicoKolterComputerScienceandArticialIntelligenceLaboratoryMassachusettsInstituteofTechnologyCambridge,MAkolter@csail.mit.eduMatthewJ.JohnsonLaboratoryforInformationandDecisionSystemsMassachusettsInstituteofTechnologyCambridge,MAmattjj@csail.mit.eduABSTRACTEnergyandsustainabilityissuesraisealargenumberofproblemsthatcanbetackledusingapproachesfromdataminingandmachinelearning,buttractionofsuchproblemshasbeenslowduetothelackofpubliclyavailabledata.InthispaperwepresenttheReferenceEnergyDisaggregationDataSet(REDD),afreelyavailabledatasetcontainingde-tailedpowerusageinformationfromseveralhomes,whichisaimedatfurtheringresearchonenergydisaggregation(thetaskofdeterminingthecomponentappliancecontributionsfromanaggregatedelectricitysignal).Wediscusspastap-proachestodisaggregationandhowtheyhavein\ruencedourdesignchoicesincollectingdata,wedescribethehardwareandsoftwaresetupsforthedatacollection,andwepresentinitialbenchmarkdisaggregationresultsusingawell-knownFactorialHiddenMarkovModel(FHMM)technique.1.INTRODUCTIONEnergyandsustainabilityproblemsrepresentoneofthegreatestchallengesfacingsociety.Morethan83%oftheworld'senergycomesfrom(unsustainable)fossilfuels,withrenewableenergyfromwind,solar,geothermalandbiomassmakinguponlyapproximately2%ofthetotal[11].Mean-while,thedemandforenergyisconstantlygrowing:world-wideenergyproductiongrewby46%inthethe20yearsfrom1987to2007[11].Thesimplephysicallimitsofourcurrentenergyresources,aswellastheenvironmentalandclimateimpactofburningmassiveamountsoffossilfuels,makearesearchfocusonissuesofsustainabilityimperative.Furthermore,therearenumerousproblemsinsustainabilitthatarefundamentallydataanalysisandpredictiontasks,areaswheretechniquesfromdataminingandmachinelearn-ingcanproveinvaluable.Despitetheimportanceofsustainabilityresearchandtherelevanceofdataminingandmachinelearningtechniques,therehasbeenrelativelylittleworkintheseareas,atleastcomparedtootherapplicationsareassuchascomputationalPermissiontomakedigitalorhardcopiesofallorpartofthisworkforpersonalorclassroomuseisgrantedwithoutfeeprovidedthatcopiesarenotmadeordistributedforprotorcommercialadvantageandthatcopiesbearthisnoticeandthefullcitationontherstpage.Tocopyotherwise,torepublish,topostonserversortoredistributetolists,requirespriorspecicpermissionand/orafee.SustKDD2011August2011,SanDiego,CA,USACopyright2011ACM978-1-4503-0840-3...$10.00.             \n \r        !\r"#$  % & %' ( )*  Figure1:AnexampleofenergyconsumptionoverthecourseofadayforoneofthehousesinREDD.biologyormachinevision.Wearguethatthissituationisatleastpartlyduetothescarcityofpubliclyavailabledataforsuchdomains.Forexample,althoughtherearevastamountsofdatarelevanttoenergydomains(theenergyconsumptionofeachindividualbuildingandhouseholdinthecountry,theloadingofeachelectricaltransmissionanddistributionline)themajorityofthisdataisunavailabletoresearchers.Fur-thermore,thereissigni cantevidencethatpubliclyavail-abledatasetshavespurredpreviousapplicationsareasinmachinelearninganddatamining:biologicalapplicationshavebeenaidedgreatlybythedatasharingmandatesofbiologicaljournalsandgovernmentorganizations[16,12];manyearlysuccessesinnaturallanguageprocessingwerespawnedbythenow-classicWallStreetJournalcorpus[10];andmachinevisionresearchhasbeenaidedgreatlybycom-monbenchmarkdatasestsuchasMNISTdigitrecognition[9],CalTech101[3],andthePASCALchallenge[2].De-spitesomeinitialprogresstowardsthissamegoalforenergandsustainabilitydomains[17],therearecurrentlyfewsuchdatasetsgearedtotheMLanddataminingcommunities.Inthispaper,wepresentourworkondevelopingapublicdatasetofthistype,termedtheReferenceEnergyDisag-gregationDataSet(REDD).Thedataisspeci callygearedtowardsthetaskofenergydisaggregation:determiningthecomponentdevicesfromanaggregatedelectricitysignal.REDDconsistsofwhole-homeandcircuit/devicespeci celectricityconsumptionforanumberofrealhousesoverseveralmonths'time.Foreachmonitoredhouse,werecord(1)thewholehomeelectricitysignal(currentmonitorsonbothphasesofpowerandavoltagemonitorononephase)recordedatahighfrequency(15kHz);(2)upto24individ-ualcircuitsinthehome,eachlabeledwithitscategoryofapplianceorappliances,recordedat0.5Hz;(3)upto20 plug-levelmonitorsinthehome,recordedat1Hz,withafocusonloggingelectronicsdeviceswheremultipledevicesaregroupedtoasinglecircuit.AnexampleofthistypeofdataisshowninFigure1.Asofthetimeofwriting(June15th,2011),wehave10homesmonitored,withatotalof119daysofdata(combinedoverallhomes),268uniquemoni-tors,andmorethan1terabyteofrawdata.Tothebestofourknowledge,REDDrepresentsthelargestpubliclyavail-abledatasetfordisaggregationwiththetrueloadsofeachhouseidenti ed.Theentiretyofthedataaswellascodeforparsingthedataandrunningbasicalgorithmsispubliclyavailableontheweb:.Whilewepresentsomebasicresultsondisaggregationhere,thefocusofthispaperisthedatasetitself:thedesigndecisionsthatwentintothedatacollection,aswellasthehardwareandsoftwaresystem.Webeginbypresentingabriefoverviewofexistingworkondisaggregationanddis-cusshowthisin\ruencedourchoicesofwhichdatatocollectfromeachhomeandatwhatfrequency.Wethendescribethesoftwareandhardwaresystemswehavebuiltforthistask,anddiscusstheirstrengthsandlimitations.Finally,wepresentbriefresultsonthedata,andhighlightseveraldirectionsforfuturealgorithmicwork.2.ENERGYDISAGGREGATIONEnergydisaggregation,alsoreferredtoasanon-intrusiveloadmonitoring(NILM),1isthetaskofusinganaggregateenergysignal,suchasthatcomingfromawhole-homepowermonitor,tomakeinferencesaboutthedi erentindividualloadsofthesystem.Thevalueofthistechnologyisthatinformationaboutindividualappliancesismuchmoreuse-fultoconsumersthansimplytotalelectricityusage;stud-ieshaveshownthatuserfeedbackofthistypecaninducebehaviorchancesthatimproveusereciencyby15%[1,13].Disaggregationtechnologyisalsoseenasanintermedi-atebetweenexistingelectricitymeters(whichmerelyrecordwhole-homepowerusageatsomefrequency)andafullyenergy-awarehomeappliancenetwork,whereeachdevicereportsitsconsumptiontoacentrallocation;anoft-statedgoalofdisaggregationresearchistopushenergyawarenesstoaubiquitouslevel,pavingthewayformoredetaileden-ergymonitoringinthefuture.AcademicworkonenergydisaggregationbeganwiththeworkofHartet.al[6]inthe1980sand1990s.Theini-tialapproacheslookforsharpedges(correspondingtode-viceon/o events)inboththerealandreactivepowersig-nals,andclusterdevicesaccordingtothesechangesincon-sumption.Laterworkhasexploredanumberofdi erentdirections:Usingmorecomplexdevicemodelswithmulti-plestates,integratingfrequencyanalysisandotherfeaturesoftheACwaveforms,andmakinguseofexternalfeaturessuchastimeofdayorweatherconditions.Arecentreviewofnumerousexistingtechniquesforenergydisaggregationcanbefoundin[18].Inthispaper,wehighlightsomeofthekeydistinctionswhichhavecharacterizedpastworkinen-ergydisaggregationandhowtheyhaveinformedourchoicesforREDD.FrequencyofMeasurements.Pastworkhasspannedabroadrangeintermsofthefrequencyofenergymeasure- 1Someauthorsmakesubtledistinctionsbetweenenergydis-aggregationandNILM,butforourpurposeswetreatthesetermsassynonymous.mentsusedfordisaggregation:someworkhasusedaveragepowermeasurementsoverperiodsaslongasanhour[8],whileothershaveanalyzedtheharmonicsofACwaveformsusingMHzresolutions[14,5].2Mostapproachesfallsome-whereinbetweenthesetwoextremes,withmanystudieseitherusingpowerreadingsontheorderofa1HzrateorACcurrentmeasurementsontheorderofseveralkilohertz.Sincehigher-frequencymeasurementscanbesub-sampledtoproducelower-frequencydata,forourpurposesofdatacol-lectionitmakessensetocollectdataatthehighestfrequencpossibleuptothefeasibilityofstoringthedata.Wechose15kHzmonitoring(forthewhole-homedata)asatrade-o betweenthesefactors.Real/ReactivePower.Pastworkhasalsodi eredinwhetherthemethodsconsideronlytherealpowersig-nalorboththerealandreactivepowers.3Thisdecisionisconnectedtothepointabove,sincerealandreactivepowerscanbecomputedusingmeasurementsoftheACwaveform,butreactivepowerisacommonenoughquantitytomerititsowndistinction.ForREDD,sincewearecollectingtheACwaveformitself,wecaneasilycomputebothrealandreactivepowers.UseofExternalFeatures.Somepastapproachuseex-ternalfeaturessuchastimeofday,dayofyear,orweatherinformation,whereassomemerelyusethepowersignalit-self.AlldatainREDDisrecordedwithUTCtimestamps,alongwithgeneralgeographicalinformation(onlyuptoacitylevel,forprivacyreasons),sothatitcanbeassociatedwithsuchexternalfeatures.Supervised/UnsupervisedTraining.Mostapproachestoenergydisaggregationhavebeensupervised,inthatthesystemistrainedonindividualdevicepowersignals(orisgivenmanuallyidenti eddevicechange-pointsinawhole-homeenergysignal).Alternatively,somerecentworkhasadvocatedunsupervisedapproachesthatconsiderthewholehomesignalwithoutlabeling,andautomaticallyseparatedi erentsignals[7].Tofacilitatesupervisedapproachesandtoaidinevaluatingallapproaches,REDDincludesasmuch\supervised"informationaspossible:wemonitoreachin-dividualcircuitinthehome(especiallyimportantforlargeloadsthatcannotbeeasilymonitoredbyaplugload)aswellasmanylargeplugsloadsasisfeasible.Training/TestingGeneralization.Anotherkeydis-tinction(whichhasnotbeengreatlyconsideredinpasten-ergydisaggregationwork)isgeneralizingfromtrainingdatatotestdata.Thevastmajorityofpreviousdisaggregationapproaches(atleastthosewithrigorousquantitativeevalu-ation)havetypicallyevaluatedthealgorithmsonthesamedevices(butindi erentconditions)astheyweretrainedon;thatis,theyattempttobuildamodelthatcandis-aggregateagivenapplianceeveninnewconditions,butdonotattempttobuildmodelsthatexplicitlygeneralizeacrossmultipledi erentdevicesofthesamecategory.Inourown 2TheworkofPatelelal.issubstantiallydi erentfrommostotherapproachestodisaggregation,astheyusehighfrequencymeasurementstolookfortransientsofthevoltagesignalofthehome,andnotnecessarilythecurrent.3Sincethedataminingcommunitymaynotbefamiliarwiththisterminology,brie\ry,realpowercorrespondstothepowerthatisactualconsumedbyanappliance,whereasre-activepowercorrespondstocurrentthat\rowsthroughacircuit,butisputbackintothesystemtypicallyviaanin-ductiveloadintheappliance.AnytextonACpowerwillincludearigorousdescriptionofthesequantities. Figure2:Schematicofthedi erentcomponentsoftheREDDhardwareandsoftwaresystem.pastwork[8]wehaveconsideredthischallengeofgeneral-izingacrossmultiplehomes,butthedatausedinthatworkwasonlyavailableatanextremelylowresolution(onehour),andwasnotpermittedtobemadepubliclyavailable,greatlylimitingtheabilityofresearcherstodirectlycomparetotheapproach.Incontrast,agoalofREDDistoconsiderseveraldi erenthomes,suchthatworkthatattemptstogeneralizeacrossdevicetypescanberigorouslyevaluatedusingthisdataset.Asexpected,andasweillustrateconcretelyinSection4,generalizationacrosshomesanddevicecategoriesmakesdisaggregationamuchmorechallengingproblem.EvaluationMetrics.Finally,previousworkinpowerdisaggregationhasuseddi erentmetricsforevaluatingper-formance:initialworktypicallyfocusedonlyonon/o changesfordevices,andthenaturalmetrichereiswhethertheal-gorithmcancorrectlyclassifywhichdeviceisturningonoro givenachangepointinthewholehomesignal.Anal-ternativeapproachistolookatthepercentageofenergycorrectlyclassi ed(theoriginalworkbyHartetal.,[6]con-sideredboththesemetrics).Thelatterhastheadvantagethatitismoregenerallyapplicabletodisaggregationtasks,sinceitdoesnotrelyonextractingedgesintheaggregatepowersignal,andappliestodeviceswithmultiplestatesor\smooth"powerons.Thismetricnaturallyweightshigh-powerdevicesmoreheavilythanlow-powerdevices.Whilewearguethatthisfeatureisoftendesirable,sincetheab-solutepowerconsumedistheultimatequantitywehopetoin\ruence,themetricmayindicategoodperformanceevenwhenlow-powerdevicesareclassi edpoorly,andinsomecasestheselow-powerdevicesarethoseoverwhichtheuserhasgreatestcontrol.Thus,whilewewillconsiderthe\totalenergyproperlyclassi ed"metricinourexperiments,REDDcanaccommodatemanyperformancemetrics.3.THEREDDHARDWAREANDSOFTWARESYSTEMSWedevelopedtheREDDhardwareandsoftwaresystemswiththeconsiderationsoftheprevioussectioninmind.Thehardwaresystemineachhouselogsdatafromthewhole-homecurrentandvoltage(athigh-frequency)fromeachin- Figure3:EnmetricrouterandPowerPort,designedandbuiltbyEnmetricSystems,Inc. Figure4:TheeMonitor,designedandbuiltbyPow-erhouseDynamics,Inc.dividualcircuitandfromselectedplugs.Thedataisloggedbothlocallyandtocentraldatabase,whichstoresinforma-tionfromallthehousesandcanbeaccessedviaawebin-terface.AschematicofthesystemisshowninFigure2.3.1HardwareSetupForplug-leveldata,weuseawirelessplugmonitoringsystemdevelopedbyEnmetric(http://www.enmetric.com),showninFigure3.Thesystemconsistsofseveralpowerstrips,eachcontainingfourindependentlymonitoredout-lets,andarouterthatconnectstothehome'sinternetcon-nectionviaDHCPandprocessesthereadingfromeachofthewirelessdevices.Energyinformationisthensenttoacentralserveratarateof1Hz.Becausethesystemreportsatasucientrateandisfairlyeasytoinstallinmosthomes,weusethesystemas-isfortheplugleveldatacollection.Circuit-leveldataandwhole-homedatarequireamorein-volvedsetup.Forcircuitleveldata,weagainmakeuseofano -the-shelfsolution:theeMonitor,developedbyPow-erhouseDynamics(http://www.powerhousedynamics.com),showninFigure4.TheeMonitorcomeswithcurrenttrans-formers(CTs)thatattachtoeachindividualcircuitofthehomeinahouse'scircuitbreakerpanel;theversionweusemonitorsupto24circuitsindependently.However,theeMonitorreportspowerconsumptiontoacentralserveratamaximumrateofonceperminute.Sincewearelookingformorefrequentpowerreadings,wedirectlyrequestmeasure-mentsfromthemonitorusingitsAPIatthehighestratepossible(forthecurrenthardware,aboutonereadingforallthecircuitsevery3seconds).Tomeasurewhole-homeACwaveformsathighfrequency,weuseCTsfromaTED(http://www.theenergydetective.com)tomeasurecurrentinthepowermains,aPicoTA041oscillo-scopeprobe(http://www.picotechnologies.com)tomeasurevoltageforoneofthetwophasesinthehome,andaNationalInstrumentsNI-9239analogtodigitalconvertertotransformboththeseanalogsignalstodigitalreadings.ThisA/Dcon-verterhas24bitresolutionwithnoiseofapproximately70V,whichdeterminesthenoiselevelofourcurrentandvolt- Figure5:TheREDDBox,installedinahome(left)andshowinginternals(right).agereadings:theTEDCTsareratedfor200ampcircuitsandamaximumof3volts,soweareabletodi erentiatebe-tweencurrentsofapproximately((200))(70106)=(3)=4:66mA,correspondingtopowerchangesofabout0:5watts.Similarly,sinceweusea1:100voltagestepdownintheos-cilloscopeprobe,wecandetectvoltagedi erencesofabout7mV.Allthedataissenttoalaptop,whichlogsthedataandsendsasubsetoftherawdatatoourcentralserver.Finally,sincethesystemcontainsanumberofelectronicsincloseproximitytothecircuitbreakerbox(theeMoni-tor,A/Dconverter,oscilloscopeproper,computer,externalharddrive,andvariouspowersupplies/cables),wehavebuiltsmallboxes,dubbed\REDDBoxes,"tocontainallthesepartsinasingleunit.ApictureofthecompletesystemasitwouldbeinstallednearacircuitbreakerboxisshowninFigure5.3.2SoftwareSystemThesoftwaresystemonthelaptopineachREDDBoxcon-tainsallthelogictoquerydatafromeachofthemonitors,storethereadingslocally,andsendprocessedinformation(powerfromeachofthemonitorsatamaximumof1Hz)toacentraldatabase.Recallthatwelogtwophasesofcurrentandonephaseofvoltageat15kHz;readingsfromtheA/Dconverterare24-bit,resultingin11GBofdataloggedfromeachhomeperday(whichwecancompress1.5-3Xus-ingbzip2).Itisinfeasibletosendthismuchinformationoverthenetwork,sowelogthedatalocallytoanexternalharddisk,whichwemanuallycollectperiodicallyandcopytotheREDDserver.SincedevelopingnewfeaturesfromACwaveformsisaprincipalresearchdirectionfordisaggre-gationmethods,weincludethisfulldataforresearcherstoanalyzeifdesired,thoughwealsocomputesimplepowerinformationfromthesignal.Inadditiontothesoftwarerunninglocallyateachhome,wehaveacentraldatabasethatstorespowerreadingsfromallthehomes,aswellasawebinterfacethatdisplaysthereal-timestatusofthesystemandallowsuserstoseerecentdatafromthehouses.AviewofthewebinterfaceisshowninFigure6.3.3PrivacyConsiderationsFinally,giventhenatureofthispublicdataset,wewanttobrie\rydiscusstheprivacyconcernsinvolved.Sharingsomeone'sreal-timepowerdatainadditiontotheiridentitispotentiallyquiteharmful:inadditiontosimplybeingabletoestimateprivateinformationsuchastheamountoftimesomeonespendswatchingtelevision,itwouldbequiteeasy Figure6:Livewebinterfacedisplayingpowercon-sumptionatthecircuitlevelinahome. !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !"# !!! !!! !!! !!! !!! ! ! ! Figure7:GraphicalmodelrepresentationoftheFactorialHiddenMarkovModel(FHMM).todetermineifsomeonewashomeornotbasedupontheirpowerusage.Forthesereasons,we(1)storenoidentifyinginformationaboutthehousesinthedatabase,anddiscloseonlythattheyareinthegreaterBostonarea,and(2)releaseonlyhistoricaldata,andkeeptheliveportionofthewebsiteforprivateusealone.(Allparticipantsinthestudyaremadeawareofthesestipulations.)Althoughprivacyconcernsarestillanissuethatrequiresconstantmonitoring,ourhopeisthatthesesafeguardsgreatlydecreasetheriskofdisclosingoridentifyingpersonalinformationfromthedata.4.EXPERIMENTALRESULTSHerewepresentexamplesofsimplealgorithmsappliedtoREDD.Thegoalofthissectionisnottopresentstate-of-the-artperformanceresults,butrathertodemonstratetheperformanceofawell-studiedalgorithmforthistaskandhighlightsomeofthechallengesforfuturework.WefocusontheFactorialHiddenMarkovModel(FHMM)[4],whichhasbeenconsideredrecentlyasamethodfordis-aggregation[7].IntheFHMM,eachofthendevices(orcir-cuits)inthehomeisdescribedviaaHiddenMarkovModel(HMM).Eachdevicehasadiscretehiddenstate,denotedx(i)t2f1;:::;Nigforthestateattimetfordevicei,whichcorrespondsroughlytotheinternalstateofthedevice(\o ",orinoneofseveralpossible\on"states).Ateachtimet,giventheinternalstate,theithdeviceemitsaGaussian-distributedpower,denotedy(i)t,withstate-speci cmeanandvarianceparameters.However,weonlyobservethesumofallthepoweroutputsateachtime,ytPni=1y(i)t.Thedisaggregationtaskcanthenbeframedasaninfer-enceproblem:givenanobservedsequenceofaggregateen- House Monitors DeviceCategories 1 20 Electronics,Lighting,Refrigera-tor,Disposal,Dishwasher,Furnace,WasherDryer,SmokeAlarms,BathroomGFI,KitchenOutlets,Microwave 2 19 Lighting,Refrigerator,Dishwasher,WasherDryer,BathroomGFI,KitchenOutlets,Oven,Microwave,ElectricHeat,Stove 3 24 Electronics,Lighting,Refrigera-tor,Disposal,Dishwasher,Furnace,WasherDryer,BathroomGFI,KitchenOutlets,Microwave,ElectricHeat,OutdoorOutlets 4 19 Lighting,Dishwasher,Furnace,WasherDryer,SmokeAlarms,Bath-roomGFI,KitchenOutlets,Stove,Disposal,AirConditioning 5 10 Lighting,Refrigerator,Disposal,Dish-washer,WasherDryer,KitchenOut-lets,Microwave,Stove Table1:Descriptionofthehousesanddevicesusedintheevaluation.ergyy1;:::;yT,weaimtocomputetheposteriorprobabil-ityoftheindividualdeviceconsumptionsy(i)t,i=1;:::;n,t=1;:::;T.AgraphicalmodeldepictingrepresentingthisFHMMisshowninFigure7.AlthoughtrainingandinferenceinanFHMMarenontriv-ial,thealgorithmsaredescribedindetailinotherwork,andsoweonlydiscussthembrie\ryhereandincludecodeforthealgorithmintheREDDrelease.Tobuildthemodelfromdataweusetheindividualapplianceenergysequences,ascollectedbytheindividualdevicemonitors,andtrainHMMsusingthestandardBaum-Welch(EM)algorithm(thus,thealgorithmwearedescribingfallsunderthe\supervised"des-ignationofSection2).ExactposteriorinferenceintheFHMMmodelisnottractable(weuse4statesperdevice,andtypicallyaround20devicesperhomeforatotalof42011012di erentcombinationsofhiddenstates),soweuseablockedGibbssamplingscheme:we xthehiddenstatesofallbutoneofthechains,resultinginaGaussianposteriorovertheemissionsovertheremainingchain;atthispoint,wecanecientlysampleoverhiddenstatesfortheheld-outchain,andrepeattheprocessuntilthedistri-butionoverallhiddenstatesmixes.(Wealsoannealthesamplingprocedurebyarti ciallyin\ratingthevarianceoftheobservedaggregateoutputsduringtheearlyiterationsofGibbssampling.)Toevaluatethemethod,weused2weeksofdatafrom5ofthehousesinREDD;sincetheplug-levelmonitorshadnotyetcollectedsucientdataatthetimeofwriting,weusethewhole-homeandcircuitleveldata.AdescriptionofthedevicesineachofthesehomesisgiveninTable1.Inthepresentedexperimentswesub-sampledthedatato10secondintervalsusingamedian lter.Toevaluatetheperformanceofthemethod,weusedthe\totalenergycorrectlyassigned"metricdescribedinSection2,de nedformallyasAcc=1PTt=1Pni=1 ^y(i)ty(i)t 2PTt=1yt(1)where^y(i)tdenotesthealgorithm'spredictionfortheithde- House FHMM SimpleMean Train Test Train Test 1 71.5% 46.6% 41.4% 21.5% 2 59.6% 50.8% 39.0% 36.7% 3 59.6% 33.3% 46.7% 18.8% 4 69.0% 52.0% 52.7% 32.5% 6 62.9% 55.7% 33.7% 19.8% Total 64.5% 47.7% 42.7% 25.9% Table2:Percentageoftotalenergyclassi edcor-rectlyfordi erenthouses,usingFHMMdisaggre-gationandasimplemodelthatpredictsthedevice'saverageconsumptionpercentageateachtime.  \n   \r \n                    Figure8:PiechartshowingpredictedandactualenergyconsumedforHouse5(whentheFHMMistrainedonlyonhouses1-4),averagedoverthecourseoftwoweeks.viceatthetthtimestep,andwherethe2factorinthedenominatorcomesfromthethatthattheabsolutevaluewill\doublecount"errors,sincePni=1y(i)tPni=1^y(i)t.Table2showsthedisaggregationperformanceoftheFHMMmodelonthe vehousesweconsider.Wefocusontwotest-ingprocedures:inthe rstcasewebuildHMMmodelsfromdevicesinagivenhouse,andthenattempttodisaggregateenergyinthathouse;inthesecondcase,wetrainonfourofthehousesandtestontheremainingheld-outhouse.Thisprocedureisanalogousto\training"versus\testing"error,andwelabeltheresultsaccordinglyinTable2.Forcom-parisonwealsoshowtheperformanceofasimplemeanpre-dictionalgorithm,whichestimatesthetotalpercentagethateachdevicetypeconsumesandpredictsthatthetotalenergybreaksdownaccordingtothispercentageatalltimes.AsseeninTable2,theFHMMisabletodisaggregatethepowerdatareasonablywell;asexpected,thereisasignif-icantdropinaccuracywhenmovingfromtrainingpredic-tiontotestprediction,buttheFHMMmethodstillworkssubstantiallybetterthansimplemeanprediction.Althoughaverageaccuraciesofaround50%mayseemlow,weempha-sizethatthisisforthecaseofpredictingapreviouslyunseensetofdevices,andthismetricmeasuresthepercentageoftotalenergycorrectlyclassi edateach10secondinterval.Ifweaggregatethepredictionsoveralongertimehorizon,thenerrorstendto"cancelout,"andweoftenobtainmuchhigheraccuracy.Figure8,forexample,showsthetotaltrueandpredictedenergyforhouse5(trainingonlyonhouses1-5),summedovertwoweeks.Atthislevelofaggregation,themethodclassi es82%oftheenergycorrectly,andsuch\aggregated"chartshavesigni cantvalueforuserfeedback.5.CONCLUSION          \n \r\r  \n     Figure9:PredictedversusactualenergysignalfortherefrigeratorinHouse5.ThispaperhasintroducedREDD,adatasetforresearchenergydisaggregation.Energydisaggregationisanalgo-rithmicchallengewhereadvancescanhavearealimpactonenergyeciencyandsustainability.Wehavedescribedthehardwareandsoftwaresetupanddemonstratedastandardalgorithm,theFHMM,forthedisaggregationtask.OurultimategoalindevelopingREDD,however,istoprovideaneasily-accessibledatasetforresearcherswork-ingindataminingormachinelearning.Thus,wehighlightthefactthatwhileFHMMsperformedreasonablywellintheexperimentswepresented,thereisalsomuchroomforimprovement.Forexample,Figure9showsanactualandpredictedsignalfortherefrigeratorinHouse5;althoughtheFHMMsometimesextractsthesignalcorrectly,italsooftenfailstodetecttherefrigeratororestimatesanoisyandun-relatedsignal.Manymodi cations,suchincludingexplicitdurationsviaanHSMM[15],incorporatinghardconstraintsondevicesignals,orlookingatmorecomplexfeaturesofthepowersignalcanallhelptoimprovethisperformance.Ofparticularinteresttousishowsuchtechniquescouldwillex-tendtogeneralizeacrossdi erentdevicesinmultiplehomes.Wearealsoexcitedbytheprospectofsemi-supervisedtech-niquesfordisaggregation;whileREDDaimstobealargeresource,wecanonlyout tsomanyhomeswithsuchde-tailedsensing,andagreatchallengethatremainsistodis-coverwaystomergethistypeofhigh- delitymeasurementwiththemassiveamountsof(unlabeled)smartmeterdatathatutilitycompaniescurrentlygenerate.OurhopeisthattheavailabilityofadatasetsuchasREDDcanfurthermo-tivatethemachinelearninganddataminingcommunitiestotacklethisproblem.AcknowledgmentsThisworkwassupportedbyARPA-E(AdvancedResearchProjectsAgency-Energy)undergrantnumberDE-AR0000018.J.ZicoKolterissupportedbyaNationalScienceFounda-tionComputingInnovationFellowship.MattJ.JohnsonissupportedbyanNationalScienceFoundationGraduateResearchFellowship.WethankCarrieArmelandMarioBergesforhelpfuldiscussions.WethankEnmetricSystemsandPowerhouseDynamicsfortheirassistanceinusingtheirhardwareforthisproject.6.REFERENCES[1]S.Darby.Thee ectivenessoffeedbackonenergyconsumption.Technicalreport,EnvironmentalChangeInstitute,UniversityofOxford,2006.[2]M.Everingham,L.V.Gool,C.Williams,J.Winn,andA.Zisserman.ThePASCALvisualobjectclasses(VOC)challenge.InternationalJournalfoComputerVision,88(2):303{338,2010.[3]L.Fei-Fei,R.Fergus,andP.Perona.Learninggenerativevisualmodelsfromfewtrainingexamples:AnincrementalBayesianapproachtestedon101objectcategories.ComputerVisionandImageUnd.,106(1):59{70,2007.[4]Z.GhahramaniandM.I.Jordan.Factorialhiddenmarkovmodels.MachineLearning,29(2{3):245{273,1997.[5]S.Gupta,S.Reynolds,andS.N.Patel.ElectriSense:Single-pointsensingusingEMIforelectricaleventdetectionandclassi cationinthehome.InProceedingsoftheConferenceonUbiquitousComputing,2010.[6]G.Hart.Nonintrusiveapplianceloadmonitoring.ProceedingsoftheIEEE,80(12),1992.[7]H.Kim,M.Marwah,M.Arlitt,G.Lyon,andJ.Han.Unsuperviseddisaggregationoflowfrequencypowermeasurements.InProceedingsoftheSIAMConferenceonDataMining,2011.[8]J.Z.Kolter,S.Batra,andA.Y.Ng.Energydisaggregationviadiscriminativesparsecoding.InNeuralInformationProcessingSystems,2010.[9]Y.LeCun,L.Bottou,Y.Bengio,andP.Ha ner.Gradient-basedlearningappliedtodocumentrecognition.ProceedingsoftheIEEE,86(11):2278{2324,1998.[10]M.P.Marcus,M.A.Marcinkiewicz,andB.Santorini.BuildingalargeannotatedcorpusofEnglish:thePenntreebank.ComputationalLinguistics:Specialissueonusinglargecorpora:II,19(2),1993.[11]Multiple.Annualenergyreview2009.Technicalreport,U.S.EnergyInformationAdministration,2009.[12]NationalInstituteofHealth.NIHdatasharingpolicyandimplementationguidance.http://grants.nih.gov/grants/policy/data sharing/data sharing guidance.htm,2003.[13]B.NeenanandJ.Robinson.Residentialelectricityusefeedback:Aresearchsynthesisandeconomicframework.Technicalreport,ElectricPowerResearchInstitute,2009.[14]S.N.Patel,T.Robertson,J.A.Kientz,M.S.Reynods,andG.Abowd.Atthe\rickofaswitch:detectingandclassifyinguniqueelectricaleventsontheresidentialpowerline.InProceedingsoftheConferenceonUbiquitousComputing,2006.[15]L.Rabiner.Atutorialonhmmandselectedapplicationsinspeechrecognition.ProceedingsoftheIEEE,77(2):257{286,February1989.[16]ScienceMagazine.Sciencemagazine:Generalpolicies.http://www.sciencemag.org/site/help/authors/policies.xhtml,2011.[17]U.S.DepartmentofEnergy.Openenergyinfo.http://www.openei.org,2011.[18]M.ZiefmanandK.Roth.Nonintrusiveapplianceloadmonitoring:Reviewandoutlook.IEEETransactionsonConsumerElectronics,57(1):76{84,2011.