uniwuerzburgde Abstract When intelligent systems are deployed into a realworld application then the maintenance and the re64257nement of the knowledge are essential tasks Many exist ing automatic knowledge re64257nement methods only provide limited c ID: 85704
Download Pdf The PPT/PDF document "Exemplifying Subgroup Mining Results for..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
2SubgroupMiningInthissection,werstintroducetheusedknowledgerepresentation;then,webrieydescribethesubgroupminingapproach.GeneralDenitionsLet Athesetofallattributeswithanassociateddomaindom(a)ofvalues. D Adenotesthesetofalldiagnoses.VAisdenedasthe(universal)setofattributevaluesoftheform(a=v),a2 A;v2dom(a).Foreachdiagnosisd2 Dwedenea(boolean)rangedom(d):8d2 D:dom(d)=festablished;notestablishedg:Adiagnosisd2 Disderivedby(heuristic)rules.Arulerforthediagnosisdcanbecon-sideredasatriplecond(r);conf(r);d,wherecond(r)istherulecondition,conf(r)istheconrmationstrength.Thusaruler=cond(r)!d;conf(r)isusedtoderivethediagnosisd,wheretheruleconditioncond(r)containsconjunctionsand/ordisjunctionsof(negated)ndingsfi2VA.Thestateofadiagnosisisgraduallyinferredbysummingalltheconrmationstrengths(points)oftherulesthathavered;ifthesumisgreaterthanaspecicthresholdvalue,thenthediagnosisisassumedtobeestablished.LetCBdenotethecasebasecontainingallavailablecases.Acasec2CBisdenedasatuplec=(Vc;Dc);whereVcVAisthesetofattributevaluesobservedinthecasec.ThesetDc Disthesetofdiagnosesdescribingthesolutionofthiscase.BasicSubgroupMiningSubgroupmining[Klö02]isamethodtodiscover"interest-ing"subgroupsofcases,e.g.,"smokerswithapositivefamilyhistoryareatasignicantlyhigherriskforcoronaryheartdisease".Asubgroupminingtaskmainlyreliesonthefol-lowingfourproperties:thetargetvariable,thesubgroupdescriptionlanguage,thequalityfunction,andthesearchstrategy.Wewillfocusonbinarytargetvariables.Subgroupsaredescribedbyrelationsbetweenindependent(explaining)variablesandadependent(target)variable.Asubgroupdescriptionsd=feigisdenedbythecon-junctionofasetofselectionexpressions.Theseselectorsei=(ai;Vi)areselectionsondomainsofattributes,ai2 A;Vidom(ai). sddenotesthesetofallpossiblesubgroupdescriptions.Aqualityfunctionmeasurestheinterestingnessofthesubgroupmainlybasedonastatisti-calevaluationfunctions.Itisusedbythesearchmethodtorankthediscoveredsubgroupsduringsearch.Formally,aqualityfunctionq: sdVA!Revaluatesasubgroupde-scriptionsd2 sdgivenatargetvariablet2VA.Severalqualityfunctionsareproposed,forexamplein[Klö02].TheexemplaryqualityfunctionqBT=pp0 p p0(1p0)p nq N Nn;isapplicableforbinarytargetvariables,wherepistherelativefrequencyofthetargetvariableinthesubgroup,p0istherelativefrequencyofthetargetvariableinthetotalpopulation,Nisthesizeofthetotalpopulation,andndenotesthesizeofthesubgroup.Anefcientsearchstrategyisnecessaryforsubgroupmining,sincethesearchspaceisexponentialconcerningallpossibleselectionexpressions.Weapplyamodiedbeamsearch,whereasubgroupdescriptioncanbeselectedasaninitialvalueforthebeam. ExemplifyingSubgroupMiningResultsAsoutlinedabove,theresultsofthesubgroupminingstepareasetofsubgroupswhichareusedtoderiveasetofpotentialfaultyfactorsPFF(principalandsupportingfactors).Thesearethenpresentedandproposedforre-nement.Forexample,considerthesubgroup"smokerswithapositivefamilyhistoryareatasignicantlyhigherriskforcoronaryheartdisease":theprincipalfactorsconsistofsmoker=trueandfamilyhistory=positive,andthepotentialsupportingfactorscouldbehy-pertension=true,overweight=true,age50.Asoutlinedabove,theinterpretationofPFFdependsonthejudgmentoftheuser,especiallyonhis/herexistingbackgroundknowledge.Tosupporttheuser,weproposetoutilizetheimplicitexperiencescontainedinthecasesofthecasebaseasexplainingexamplesforPFF.Then,typicalandextremecaseswithahighcoverageofthesetofPFFcanberetrievedforpresentationtotheuser.Thesecasescontain"real-world"experience,andadditionalfactorsthatarerelatedtoPFF.Thesefactorscanpotentiallyhelptofurthersupportrenementdecisions.Anaivesolutionretrievesallcasescontainedinthesubgroupthatarealsocontainingthetargetconcept.However,thisapproachsuffersfromtwoshortcomings:rst,thesetofcasescanbequitelargeforacomprehensiveoverview,andsecondasubsetofPFFisnotaccountedforveryprecisely,i.e.,thesupportingfactors.Therefore,weaimtoretrieveasetofcasesthathaveahighcoveragewiththesetPFF.Then,wehavetwooptionstocharacterizePFFelements:rstwecanretrievetypicalcasesthatarehighlysimilartoPFFwhiletheindividualcasescanalsobeverysimilartoeachother.ThesecasescanbeusedtoexemplifythemostcommonfactorscontainedinPFF.Second,wecanretrieveextremecases,i.e.,casesthatareverysimilartoPFFbutnottoeachother.ThissetofdiversecasesisdiscriminativebutstillsimilartoPFFandcanbeusedtogetacomprehensivedescriptionofextremefactorcombinationsconcerningPFF.Fortheretrievalstepweusetechniquesknownfromcase-basedreasoning[AP94].Here,givenaquerycaseqthegeneralgoalistoretrieveasetofmostsimilarcasesfcig.Theattributevaluescontainedinthequerycasearecommonlycalledtheproblemdescription.Weconstructa"virtual"querycaseqanddeneitsproblemdescriptionasthesetofpo-tentialfaultyfactorsPFFiobtainedfromagivensubgroupSGi.Optionally,theusercandeneasubsetoffactorscontainedinPFFi,e.g.,concentratingonthemostinterestingfactorssuchthatspecicqueriescanbeformulated.Thefactorsoftheconstructedquerycasecanbeinteractivelyadaptedtottheanalysistaskathand.Forassessingthesimilarityofaqueryqandaretrievedcasec,e.g.,wecanusethewell-knownmatchingfeaturessimilarityfunction.Then,forcasecomparisonthesetofat-tributesisrestrictedtotheattributescontainedinthequery(w.r.t.PFFi),i.e.,totheattributes 0A=faj9v2PFFi;v2dom(a)g;a(c)returnsthevalueofattributea:sim(q;c)=jfa2 0A:a(q)=a(c)gj 0Adiversity(RC)=k1Pi=1kPj=i+11sim(ci;cj) k(k1)=2ThediversityofasetofretrievedcasesRC=fcigkofsizekismeasuredaccordingtodiversity(RC)wherethesimilarityoftwocasesisassessedwithrespecttotheattributesintheconstructedquerycaseq,asoutlinedabove.Thenasetofmostsimilardiversecases 4SummaryandFutureWorkInthispaperweintroducedaninteractiveapproachfortherenementofrule-basedknowl-edge.Incontrasttoclassical(automatic)approachestheuserhastodecideabouttheactualrenementoperatorstobecarriedout,butisstronglysupportedbytheindicationandex-emplicationofhotspotsthatareidentiedbyasubgroupminingmethod.Theinteractiverenementapproachhasbeenalreadyevaluatedusingamedicalknowledgesystemthatiscurrentlyextendedandusedinareal-worldapplication.Duetotheexperiencesmadewiththeinteractiverenementprocesswedevelopedtheexemplicationcapabilitiesproposedinthispaper.Thismethodcanbepotentiallybeextendedusingbackgroundknowledge,e.g.,tosplittheproblemdescriptionsintopartiallydisjunctivepartitionscorrespondingtocertainproblemareas.Thenpartialcasesforthesepartitionscanberetrievedandre-combined,asdescribedin[ABP03].Inthenearfutureweareplanningtoevaluatetheusefulnessofthepresentedapproachwithinanextendedcasestudy.References[ABH+05]MartinAtzmueller,JoachimBaumeister,AchimHemsing,Ernst-JürgenRichter,andFrankPuppe.SubgroupMiningforInteractiveKnowledgeRenement.InProc.10thConferenceonArticialIntelligenceinMedicine(AIME05),Aberdeen,Scotland,2005.[ABP03]MartinAtzmueller,JoachimBaumeister,andFrankPuppe.EvaluationoftwoStrate-giesforCase-BasedDiagnosishandlingMultipleFaults.InProc.2ndConfernenceofProfessionalKnowledgeManagement(WM2003),Luzern,Switzerland,2003.[AP94]AgnarAamodtandEnricPlaza.Case-BasedReasoning:FoundationalIssues,Method-ologicalVariations,andSystemApproaches.AICommunications,7(1),1994.[BC99]RobinBoswellandSusanCraw.ValidationandVericationofKnowledgeBasedSys-tems,chapterOrganizingKnowledgeRenementOperators,pages149161.Kluwer,Oslo,Norway,1999.[CS99]LeonardoCarbonaraandDerekSleeman.EffectiveandEfcientKnowledgeBaseRe-nement.MachineLearning,37:143181,1999.[DG99]NikolaosA.DiamantidisandE.A.Giakoumakis.AnInteractiveToolforKnowledgeBaseRenement.ExpertSystems,16(1):210,1999.[Gin88]AllenGinsberg.AutomaticRenementofExpertSystemKnowledgeBases.MorganKaufmann,1988.[Klö02]WilliKlösgen.HandbookofDataMiningandKnowledgeDiscovery,chapter16.3:SubgroupDiscovery.OxfordUniversityPress,NewYork,2002.[KPJ+02]RainerKnauf,IlkaPhilippow,KlausP.Jantke,AvelinoGonzalez,andDirkSalecker.SystemRenementinPracticeUsingaFormalMethodtoModifyReal-LifeKnowl-edge.InProc.15thInternationalFloridaArticialIntelligenceResearchSocietyCon-ference(FLAIRS-2002).AAAIPress,2002.[McS02]DavidMcSherry.Diversity-ConsciousRetrieval.InProc.6thEuropeanConferenceonAdvancesinCase-BasedReasoning,pages219233,London,UK,2002.Springer.