/
Exemplifying Subgroup Mining Results for Interactive K Exemplifying Subgroup Mining Results for Interactive K

Exemplifying Subgroup Mining Results for Interactive K - PDF document

briana-ranney
briana-ranney . @briana-ranney
Follow
386 views
Uploaded On 2015-06-14

Exemplifying Subgroup Mining Results for Interactive K - PPT Presentation

uniwuerzburgde Abstract When intelligent systems are deployed into a realworld application then the maintenance and the re64257nement of the knowledge are essential tasks Many exist ing automatic knowledge re64257nement methods only provide limited c ID: 85704

uniwuerzburgde Abstract When intelligent systems

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Exemplifying Subgroup Mining Results for..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

2SubgroupMiningInthissection,werstintroducetheusedknowledgerepresentation;then,webrieydescribethesubgroupminingapproach.GeneralDenitionsLet Athesetofallattributeswithanassociateddomaindom(a)ofvalues. D Adenotesthesetofalldiagnoses.VAisdenedasthe(universal)setofattributevaluesoftheform(a=v),a2 A;v2dom(a).Foreachdiagnosisd2 Dwedenea(boolean)rangedom(d):8d2 D:dom(d)=festablished;notestablishedg:Adiagnosisd2 Disderivedby(heuristic)rules.Arulerforthediagnosisdcanbecon-sideredasatriple�cond(r);conf(r);d,wherecond(r)istherulecondition,conf(r)istheconrmationstrength.Thusaruler=cond(r)!d;conf(r)isusedtoderivethediagnosisd,wheretheruleconditioncond(r)containsconjunctionsand/ordisjunctionsof(negated)ndingsfi2VA.Thestateofadiagnosisisgraduallyinferredbysummingalltheconrmationstrengths(points)oftherulesthathavered;ifthesumisgreaterthanaspecicthresholdvalue,thenthediagnosisisassumedtobeestablished.LetCBdenotethecasebasecontainingallavailablecases.Acasec2CBisdenedasatuplec=(Vc;Dc);whereVcVAisthesetofattributevaluesobservedinthecasec.ThesetDc Disthesetofdiagnosesdescribingthesolutionofthiscase.BasicSubgroupMiningSubgroupmining[Klö02]isamethodtodiscover"interest-ing"subgroupsofcases,e.g.,"smokerswithapositivefamilyhistoryareatasignicantlyhigherriskforcoronaryheartdisease".Asubgroupminingtaskmainlyreliesonthefol-lowingfourproperties:thetargetvariable,thesubgroupdescriptionlanguage,thequalityfunction,andthesearchstrategy.Wewillfocusonbinarytargetvariables.Subgroupsaredescribedbyrelationsbetweenindependent(explaining)variablesandadependent(target)variable.Asubgroupdescriptionsd=feigisdenedbythecon-junctionofasetofselectionexpressions.Theseselectorsei=(ai;Vi)areselectionsondomainsofattributes,ai2 A;Vidom(ai). sddenotesthesetofallpossiblesubgroupdescriptions.Aqualityfunctionmeasurestheinterestingnessofthesubgroupmainlybasedonastatisti-calevaluationfunctions.Itisusedbythesearchmethodtorankthediscoveredsubgroupsduringsearch.Formally,aqualityfunctionq: sdVA!Revaluatesasubgroupde-scriptionsd2 sdgivenatargetvariablet2VA.Severalqualityfunctionsareproposed,forexamplein[Klö02].TheexemplaryqualityfunctionqBT=p�p0 p p0(1�p0)p nq N N�n;isapplicableforbinarytargetvariables,wherepistherelativefrequencyofthetargetvariableinthesubgroup,p0istherelativefrequencyofthetargetvariableinthetotalpopulation,Nisthesizeofthetotalpopulation,andndenotesthesizeofthesubgroup.Anefcientsearchstrategyisnecessaryforsubgroupmining,sincethesearchspaceisexponentialconcerningallpossibleselectionexpressions.Weapplyamodiedbeamsearch,whereasubgroupdescriptioncanbeselectedasaninitialvalueforthebeam. ExemplifyingSubgroupMiningResultsAsoutlinedabove,theresultsofthesubgroupminingstepareasetofsubgroupswhichareusedtoderiveasetofpotentialfaultyfactorsPFF(principalandsupportingfactors).Thesearethenpresentedandproposedforre-nement.Forexample,considerthesubgroup"smokerswithapositivefamilyhistoryareatasignicantlyhigherriskforcoronaryheartdisease":theprincipalfactorsconsistofsmoker=trueandfamilyhistory=positive,andthepotentialsupportingfactorscouldbehy-pertension=true,overweight=true,ag�e50.Asoutlinedabove,theinterpretationofPFFdependsonthejudgmentoftheuser,especiallyonhis/herexistingbackgroundknowledge.Tosupporttheuser,weproposetoutilizetheimplicitexperiencescontainedinthecasesofthecasebaseasexplainingexamplesforPFF.Then,typicalandextremecaseswithahighcoverageofthesetofPFFcanberetrievedforpresentationtotheuser.Thesecasescontain"real-world"experience,andadditionalfactorsthatarerelatedtoPFF.Thesefactorscanpotentiallyhelptofurthersupportrenementdecisions.Anaivesolutionretrievesallcasescontainedinthesubgroupthatarealsocontainingthetargetconcept.However,thisapproachsuffersfromtwoshortcomings:rst,thesetofcasescanbequitelargeforacomprehensiveoverview,andsecondasubsetofPFFisnotaccountedforveryprecisely,i.e.,thesupportingfactors.Therefore,weaimtoretrieveasetofcasesthathaveahighcoveragewiththesetPFF.Then,wehavetwooptionstocharacterizePFFelements:rstwecanretrievetypicalcasesthatarehighlysimilartoPFFwhiletheindividualcasescanalsobeverysimilartoeachother.ThesecasescanbeusedtoexemplifythemostcommonfactorscontainedinPFF.Second,wecanretrieveextremecases,i.e.,casesthatareverysimilartoPFFbutnottoeachother.ThissetofdiversecasesisdiscriminativebutstillsimilartoPFFandcanbeusedtogetacomprehensivedescriptionofextremefactorcombinationsconcerningPFF.Fortheretrievalstepweusetechniquesknownfromcase-basedreasoning[AP94].Here,givenaquerycaseqthegeneralgoalistoretrieveasetofmostsimilarcasesfcig.Theattributevaluescontainedinthequerycasearecommonlycalledtheproblemdescription.Weconstructa"virtual"querycaseqanddeneitsproblemdescriptionasthesetofpo-tentialfaultyfactorsPFFiobtainedfromagivensubgroupSGi.Optionally,theusercandeneasubsetoffactorscontainedinPFFi,e.g.,concentratingonthemostinterestingfactorssuchthatspecicqueriescanbeformulated.Thefactorsoftheconstructedquerycasecanbeinteractivelyadaptedtottheanalysistaskathand.Forassessingthesimilarityofaqueryqandaretrievedcasec,e.g.,wecanusethewell-knownmatchingfeaturessimilarityfunction.Then,forcasecomparisonthesetofat-tributesisrestrictedtotheattributescontainedinthequery(w.r.t.PFFi),i.e.,totheattributes 0A=faj9v2PFFi;v2dom(a)g;a(c)returnsthevalueofattributea:sim(q;c)=jfa2 0A:a(q)=a(c)gj 0Adiversity(RC)=k�1Pi=1kPj=i+11�sim(ci;cj) k(k�1)=2ThediversityofasetofretrievedcasesRC=fcigkofsizekismeasuredaccordingtodiversity(RC)wherethesimilarityoftwocasesisassessedwithrespecttotheattributesintheconstructedquerycaseq,asoutlinedabove.Thenasetofmostsimilardiversecases 4SummaryandFutureWorkInthispaperweintroducedaninteractiveapproachfortherenementofrule-basedknowl-edge.Incontrasttoclassical(automatic)approachestheuserhastodecideabouttheactualrenementoperatorstobecarriedout,butisstronglysupportedbytheindicationandex-emplicationofhotspotsthatareidentiedbyasubgroupminingmethod.Theinteractiverenementapproachhasbeenalreadyevaluatedusingamedicalknowledgesystemthatiscurrentlyextendedandusedinareal-worldapplication.Duetotheexperiencesmadewiththeinteractiverenementprocesswedevelopedtheexemplicationcapabilitiesproposedinthispaper.Thismethodcanbepotentiallybeextendedusingbackgroundknowledge,e.g.,tosplittheproblemdescriptionsintopartiallydisjunctivepartitionscorrespondingtocertainproblemareas.Thenpartialcasesforthesepartitionscanberetrievedandre-combined,asdescribedin[ABP03].Inthenearfutureweareplanningtoevaluatetheusefulnessofthepresentedapproachwithinanextendedcasestudy.References[ABH+05]MartinAtzmueller,JoachimBaumeister,AchimHemsing,Ernst-JürgenRichter,andFrankPuppe.SubgroupMiningforInteractiveKnowledgeRenement.InProc.10thConferenceonArticialIntelligenceinMedicine(AIME05),Aberdeen,Scotland,2005.[ABP03]MartinAtzmueller,JoachimBaumeister,andFrankPuppe.EvaluationoftwoStrate-giesforCase-BasedDiagnosishandlingMultipleFaults.InProc.2ndConfernenceofProfessionalKnowledgeManagement(WM2003),Luzern,Switzerland,2003.[AP94]AgnarAamodtandEnricPlaza.Case-BasedReasoning:FoundationalIssues,Method-ologicalVariations,andSystemApproaches.AICommunications,7(1),1994.[BC99]RobinBoswellandSusanCraw.ValidationandVericationofKnowledgeBasedSys-tems,chapterOrganizingKnowledgeRenementOperators,pages149–161.Kluwer,Oslo,Norway,1999.[CS99]LeonardoCarbonaraandDerekSleeman.EffectiveandEfcientKnowledgeBaseRe-nement.MachineLearning,37:143–181,1999.[DG99]NikolaosA.DiamantidisandE.A.Giakoumakis.AnInteractiveToolforKnowledgeBaseRenement.ExpertSystems,16(1):2–10,1999.[Gin88]AllenGinsberg.AutomaticRenementofExpertSystemKnowledgeBases.MorganKaufmann,1988.[Klö02]WilliKlösgen.HandbookofDataMiningandKnowledgeDiscovery,chapter16.3:SubgroupDiscovery.OxfordUniversityPress,NewYork,2002.[KPJ+02]RainerKnauf,IlkaPhilippow,KlausP.Jantke,AvelinoGonzalez,andDirkSalecker.SystemRenementinPractice–UsingaFormalMethodtoModifyReal-LifeKnowl-edge.InProc.15thInternationalFloridaArticialIntelligenceResearchSocietyCon-ference(FLAIRS-2002).AAAIPress,2002.[McS02]DavidMcSherry.Diversity-ConsciousRetrieval.InProc.6thEuropeanConferenceonAdvancesinCase-BasedReasoning,pages219–233,London,UK,2002.Springer.