/
Pattern Recognition     Genetic algorithmbased clustering technique Ujjwal Maulik  Sanghamitra Pattern Recognition     Genetic algorithmbased clustering technique Ujjwal Maulik  Sanghamitra

Pattern Recognition Genetic algorithmbased clustering technique Ujjwal Maulik Sanghamitra - PDF document

celsa-spraggs
celsa-spraggs . @celsa-spraggs
Follow
503 views
Uploaded On 2014-12-27

Pattern Recognition Genetic algorithmbased clustering technique Ujjwal Maulik Sanghamitra - PPT Presentation

T Road Calcutta 700 035 India Received 24 June 1998 received in revised form 29 April 1999 accepted 29 April 1999 Abstract A genetic algorithmbased clustering technique called GAclustering is proposed in this article The searching capability of gene ID: 30003

Road Calcutta 700

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Pattern Recognition Genetic algorith..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

PatternRecognition33(2000)1455Geneticalgorithm-basedclusteringtechniqueUjjwalMaulik,SanghamitraBandyopadhyayDepartmentofComputerScience,GoernmentEngineeringCollege,Kalyani,Nadia,IndiaMachineIntelligenceUnit,IndianStatisticalInstitute,203,B.T.Road,Calcutta-700035,India Ageneticalgorithm-basedclusteringtechnique,calledGA-clustering,isproposedinthisarticle.Thesearchingcapabilityofgeneticalgorithmsisexploitedinordertosearchforappropriateclustercentresinthefeaturespacesuchthatasimilaritymetricoftheresultingclustersisoptimized.Thechromosomes,whicharerepresentedasstringsofrealnumbers,encodethecentresofaxednumberofclusters.ThesuperiorityoftheGA-clusteringalgorithmoverthecommonlyused-meansalgorithmisextensivelydemonstratedforfourarti Correspondingauthor.Presentaddress:SchoolofCom-puterScienceandEngineering,UniversityofNewSouthWales,Sydney2052,Australia;Tel.:00-61-2-9385-3975;fax:00-61-2-E-mailaddresses:hotmail.com(U.Maulik),san-cse.unsw.edu.au(S.Bandyopadhyay).OnleavefromIndianStatisticalInstitute.overandareappliedonthesestringstoyield clusteringproblem.Wehaveimplementedelitismateachgenerationbypreservingthebeststringseenuptothatgenerationinalocationoutsidethepopulation.Thusontermination,thislocationcontainsthecentresoftheThenextsectionprovidestheresultsofimplementa-tionoftheGA-clusteringalgorithm,alongwithitscom-parisonwiththeperformanceofthe-meansalgorithmforseveralarticialandreal-lifedatasets.4.ImplementationresultsTheexperimentalresultscomparingtheGA-clusteringalgorithmwiththe-meansalgorithmareprovidedforfourarticialdatasets(Data1Data2Data3Data4andthreereal-lifedatasets(CrudeOilrespectively.Thesearerstdescribedbelow:4.1.ArticialdatasetsData1:Thisisanonoverlappingtwo-dimensionaldatasetwherethenumberofclustersistwo.Ithas10points.Thevalueofischosentobe2forthisdataset.Data2:Thisisanonoverlappingtwo-dimensionaldatasetwherethenumberofclustersisthree.Ithas76points.TheclustersareshowninFig.2:Thevalueofischosentobe3forthisdataset.Data3:Thisisanoverlappingtwo-dimensionaltri-angulardistributionofdatapointshavingnineclasseswherealltheclassesareassumedtohaveequalaprioriprobabilities( ).Ithas900datapoints.Therangesforthenineclassesareasfollows:Class1:[[0.7,3.3],Class2:[1.3,1.3][0.7,3.3],Class3:[0.7,3.3][0.7,3.3],Class4:[[!1.3,1.3], Fig.2.Data2 Fig.3.Data3pointsfromclass1,pointsfromclasspointsfromclass9). Fig.4.TriangulardistributionalongtheClass5:[1.3,1.3]1.3]!1.3,1.3],Class6:[0.7,3.3]3.3]!1.3,1.3],Class7:[[!3.3,!0.7],Class8:[1.3,1.3]1.3]!3.3,!0.7],Class9:[0.7,3.3]3.3]!3.3,!0.7].Thusthedomainforthetriangulardistributionforeachclassandforeachaxisis2.6.Consequently,theheightwillbe (since121).ThedatasetisshowninFig.3.Thevalueofischosentobe9forthisdataset.Data4:Thisisanoverlappingten-dimensionaldatasetgeneratedusingatriangulardistributionoftheformshowninFig.4fortwoclasses,1and2.Ithas1000datapoints.Thevalueofischosentobe2forthisdataset.Therangeforclass1is[0,2][0,2][0,2]10times,andthatforclass2is[1,3][0,2][0,2]9times,withthecorrespondingpeaksat(1,1)and(2,1).TheU.Maulik,S.BandyopadhyayPatternRecognition33(2000)1455 Table1obtainedby-meansalgorithmforvedierentinitialgurationsforData1 Initialcon 15.38313222.22549832.22549845.38313252.225498 Table2obtainedbyGA-clusteringalgorithmforvedierentinitialpopulationsforData1after100iterationswhen InitialpopulationGA-clustering 12.22549822.22549832.22549842.22549852.225498 Table3obtainedby-meansalgorithmforvedierentinitialgurationsforData2 Initialcon 151.013294264.646739367.166768451.013294564.725676 Theresultsofimplementationofthe-meansalgo-rithmandGA-clusteringalgorithmareshown,respec-tively,inTables1and2forData1,Tables3and4forData2,Tables5and6forData3,Tables7and8for,Tables9and10for,Tables11and12forTables13and14forCrudeOil.Boththealgorithmswererunfor100simulations.Forthepurposeofdemonstra-vedierentinitialcongurationsofthealgorithmandvedierentinitialpopulationsoftheGA-clusteringalgorithmareshowninthetables.Data1(Tables1and2)itisfoundthattheGA-clusteringalgorithmprovidestheoptimalvalueof2.225498inalltheruns.-meansalgorithmalsoattainsthisvaluemostofthetimes(87%ofthetotalruns).Howeverintheothercases,itgetsstuckatavalueof5.383132.ForData2(Tables3and4),GA-clusteringattainsthebestvalueof51.013294inalltheruns.means,ontheotherhand,attainsthisvaluein51%oftheTable4obtainedbyGA-clusteringalgorithmforvedierentinitialpopulationsforData2after100iterationswhen InitialpopulationGA-clustering 151.013294251.013294351.013294451.013294551.013294 Table5obtainedby-meansalgorithmforvedierentinitialgurationsforData3 Initialcon 1976.2356072976.3789903976.3789904976.5641895976.378990 Table6obtainedbyGA-clusteringalgorithmforvedierentinitialpopulationsforData3after100iterationswhen InitialpopulationGA-clustering 1966.3504812966.3816013966.3504854966.3125765966.354085 Table7obtainedby-meansalgorithmforvedierentinitialgurationsforData4 Initialcon 11246.23915321246.23915331246.23668041246.23915351246.237127 totalruns,whileinotherrunsitgetsstuckatdisub-optimalvalues.Similarly,forData3(Tables5and6)Data4(Tables7and8)theGA-clusteringalgorithmattainsthebestvaluesof966.312576and1246.218355in20%and85%ofthetotalruns,respectively.ThebestU.Maulik,S.BandyopadhyayPatternRecognition33(2000)1455 Table15obtainedbyGA-clusteringalgorithmforvedierentinitialpopulationsforafter500iterationswhen InitialpopulationGA-clustering 1149344.2292452149370.7629003149342.9903774149352.2893635149362.661869 (reached60%ofthetimes)and279.484810(reached30%ofthetimes),respectively.FromTables9and10for,itisfoundthatun-liketheothercases,GA-clusteringalgorithmattainsonevalue(149406.851288)thatispoorerthanthebestvalue-meansalgorithm(149373.097180).Inordertoinves-tigatewhethertheGA-clusteringalgorithmcanimproveitsclusteringperformance,itwasexecutedupto500iterations(ratherthan100iterationsaswasdonepre-viously).TheresultsareshowninTable15.Asexpected,itisfoundthattheperformanceofGA-clusteringimproves.Thebestvaluethatitnowattainsis149342.990377andtheworstis149370.762900,bothofwhicharebetterthanthoseobtainedafter100iterations.Moreover,nowitsperformanceinalltherunsisbetterthantheperformanceof-meansalgorithmforanyofthe100runs.5.DiscussionandconclusionsAgeneticalgorithm-basedclusteringalgorithm,calledGA-clustering,hasbeendevelopedinthisarticle.Geneticalgorithmhasbeenusedtosearchfortheclustercentreswhichminimizetheclusteringmetric.InordertodemonstratetheeectivenessoftheGA-clusteringalgorithminprovidingoptimalclusters,severalartiandreallifedatadatasetswiththenumberofdimensionsrangingfromtwototenandthenumberofclustersrangingfromtwotoninehavebeenconsidered.TheresultsshowthattheGA-clusteringalgorithmprovidesaperformancethatissignicantlysuperiortothatofthe-meansalgorithm,averywidelyusedclusteringtech-Floating-pointrepresentationofchromosomeshasbeenadoptedinthisarticle,sinceitisconceptuallyclosesttotheproblemspaceandprovidesastraightforwardwayofmappingfromtheencodedclustercen-trestotheactualones.Inthiscontext,abinaryrepres-entationmaybeimplementedforthesameproblem,andtheresultsmaybecomparedwiththepresentpointform.Suchaninvestigationiscurrentlybeingper-NotethattheclusteringmetricthattheGAattemptstominimizeisgivenbythesumoftheabsoluteEuclideandistancesofeachpointfromtheirrespectiveclustercentres.WehavealsoimplementedthesamealgorithmbyusingthesumofthesquaredEuclideandistancesastheminimizingcriterion.Thesameconclusionsasobtainedinthisarticlearestillfoundtoholdgood.IthasbeenprovedinRef.[23]thatanelitistmodelofGAswilldenitelyprovidetheoptimalstringasthenumberofiterationsgoestoinnity,providedtheprob-abilityofgoingfromanypopulationtotheonecontain-ingtheoptimalstringisgreaterthanzero.Notethatthishasbeenprovedfornonzeromutationprobabilityvaluesandisindependentoftheprobabilityofcrossover.How-ever,sincetheofconvergencetotheoptimalstringwilldenitelydependontheseparameters,aproperchoiceofthesevaluesisimperativeforthegoodperfor-manceofthealgorithm.Notethatthemutationoperatorasusedinthisarticlealsoallowsnonzeroprobabilityofgoingfromanystringtoanyotherstring.Therefore,ourGA-clusteringalgorithmwillalsoprovidetheoptimalclustersasthenumberofiterationsgoestoinnity.SuchaformaltheoreticalproofiscurrentlybeingdevelopedthatwilleectivelyserveasatheoreticalproofoftheoptimalityoftheclustersprovidedbytheGA-clusteringalgorithm.However,itisimperativetoonceagainrealizethatforpracticalpurposesaproperchoiceofthegeneticparameters,whichmaypossiblybekeptadaptive,iscrucialforagoodperformanceofthealgorithm.Inthiscontext,onemaynotethatalthoughthe-meansalgo-rithmgotstuckatsub-optimalsolutions,evenforthesimpledatasets,GA-clusteringalgorithmdidnotexhibitanysuchunwantedbehaviour.6.SummaryClusteringisanimportantunsupervisedclassitechniquewhereasetofpatterns,usuallyvectorsinamulti-dimensionalspace,aregroupedintoclustersinsuchawaythatpatternsinthesameclusteraresimilarinsomesenseandpatternsindierentclustersaredis-similarinthesamesense.Forthisitisnecessarytoneameasureofsimilaritywhichwillestablisharuleforassigningpatternstothedomainofaparticularclustercentre.OnesuchmeasureofsimilaritymaybetheEuclideandistancebetweentwopatterns.Smallerthedistancebetweengreateristhesimilaritybetweenthetwoandviceversa.Anintuitivelysimpleandeectiveclusteringtechniqueisthewell-known-meansalgorithm.However,itisknownthatthe-meansalgorithmmaygetstuckatsuboptimalsolutions,dependingonthechoiceoftheinitialclustercentres.Inthisarticle,weproposeasolu-tiontotheclusteringproblemwheregeneticalgorithms(GAs)areusedforsearchingfortheappropriateclusterU.Maulik,S.BandyopadhyayPatternRecognition33(2000)1455 AbouttheAuthorUJJWALMAULIKdidhisBachelorsinPhysicsandComputerSciencein1986and1989,respectively.Sub-sequently,hedidhisMastersandPh.DinComputerSciencein1991and1997,respectively,fromJadavpurUniversity,India.Dr.MaulikhasvisitedCenterforAdaptiveSystemsApplications,LosAlamos,NewMexico,USAin1997.HeiscurrentlytheHeadoftheDepartmentofComputerScience,KalyaniEngineeringCollege,India.HisresearchinterestsincludeParallelProcessingandInterconnectionNetworks,NaturalLanguageProcessing,EvolutionaryComputationandPatternRecognition.AbouttheAuthorSANGHAMITRABANDYOPADHYAYdidherBachelorsinPhysicsandcomputerSciencein1988and1991,respectively.Subsequently,shedidherMastersinComputerSciencefromIndianInstituteofTechnology,Kharagpurin1993andPh.DinComputerSciencefromIndianStatisticalInstitute,Calcuttain1998.Dr.BandyopadhyayistherecipientofDr.ShankerDayalSharmaGoldMedalandInstituteSilverMedalforbeingadjudgedthebestallroundpostgraduateperformerin1993.ShehasvisitedLosAlamosNationalLaboratoryin1997.SheiscurrentlyonapostdoctoralassignmentinUniversityofNewSouthWales,Sydney,Australia.HerresearchinterestsincludeEvolutionaryComputation,PatternRecognition,ParallelProcessingandInterconnectionU.Maulik,S.BandyopadhyayPatternRecognition33(2000)1455