# Journal of Machine Learning Research Submitted Published SupportVectorClustering Asa BenHur asabarnhilltechnologies PDF document - DocSlides

2014-12-12 205K 205 0 0

##### Description

com BIOwulf Technologies 2030 Addison st suite 102 Berkeley CA 94704 USA David Horn hornposttauacil School of Physics and Astronomy Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University Tel Aviv 69978 Israel Hava T Siegelmann hava ID: 22812

**Direct Link:**Link:https://www.docslides.com/tawny-fly/journal-of-machine-learning-research-559

**Embed code:**

## Download this pdf

DownloadNote - The PPT/PDF document "Journal of Machine Learning Research ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

## Presentations text content in Journal of Machine Learning Research Submitted Published SupportVectorClustering Asa BenHur asabarnhilltechnologies

Page 1

Journal of Machine Learning Research 2 (2001) 125-137 Submitted 3/04; Published 12/01 SupportVectorClustering Asa Ben-Hur asa@barnhilltechnologies.com BIOwulf Technologies 2030 Addison st. suite 102, Berkeley, CA 94704, USA David Horn horn@post.tau.ac.il School of Physics and Astronomy Raymond and Beverly Sackler Faculty of Exact Sciences Tel Aviv University, Tel Aviv 69978, Israel Hava T. Siegelmann hava@mit.edu Lab for Information and Decision Systems MIT Cambridge, MA 02139, USA Vladimir Vapnik vlad@research.att.com AT&T Labs Research 100 Schultz Dr., Red Bank, NJ 07701, USA Editor: NelloCritianini,JohnShawe-TaylorandBobWilliamson Abstract Wepresentanovelclusteringmethodusingtheapproachofsupportvectormachines. Data pointsaremappedbymeansofaGaussiankerneltoahighdimensionalfeaturespace,where wesearchfortheminimalenclosingsphere. Thissphere,whenmappedbacktodataspace, can separate into several components, each enclosing a separate cluster of points. We presentasimplealgorithmforidentifyingtheseclusters. ThewidthoftheGaussiankernel controlsthescaleatwhichthedataisprobedwhilethesoftmarginconstanthelpscoping with outliers and overlapping clusters. The structureof a dataset is explored by varying the two parameters, maintaining a minimal number of support vectors to assure smooth clusterboundaries. Wedemonstratetheperformanceofouralgorithmonseveraldatasets. Keywords: Clustering,SupportVectorsMachines,Gaussian&ernel 1. Introduction Clustering algorithms group data points according to various criteria , as discussed by Jain and Dubes (()88+, ,ukunaga (())-+, Duda et al. (.--(+. Clustering may proceed according to some parametric model, as in the k-means algorithm of Mac/ueen (()01+, or by grouping points according to some distance or similarity measure as in hierarchical clusteringalgorithms. 2therapproachesincludegraphtheoreticmethods,suchasShamir andSharan(.---+,physicallymotivatedalgorithms,asinBlattetal.(())3+,andalgorithms basedondensityestimationasin4oberts(())3+and,ukunaga(())-+. 5nthispaperwe propose a non-parametric clustering algorithm based on the support vector approach of 2001 Ben-Hur, Horn, Siegelmann and Va(ni).

Page 2

Ben-Hur,Horn,SiegelmannandVapnik Vapnik (())1+. 5n Sch olkopf et al. (.---, .--(+, Tax and Duin (()))+ a support vector algorithmwasusedtocharacterizethesupportofahighdimensionaldistribution. 8saby- productofthealgorithmonecancomputeasetofcontourswhichenclosethedatapoints. ThesecontourswereinterpretedbyusasclusterboundariesinBen-9uretal.(.---+. 9ere wediscussindetailamethodwhichallowsforasystematicsearchforclusteringsolutions withoutmakingassumptionsontheirnumberorshape,ﬁrstintroducedinBen-9uretal. (.--(+. 5nourSupportVectorClustering(SVC+algorithmdatapointsaremappedfromdata space to a high dimensional feature space using a Gaussian kernel. 5n feature space we look for the smallest sphere that encloses the image of the data. This sphere is mapped backtodataspace,whereitformsasetofcontourswhichenclosethedatapoints. These contours are interpreted as cluster boundaries. ;oints enclosed by each separate contour are associated with the same cluster. 8s the width parameter of the Gaussian kernel is decreased, the number of disconnected contours in data space increases, leading to an increasing number of clusters. Since the contours can be interpreted as delineating the support of the underlying probability distribution, our algorithm can be viewed as one identifyingvalleysinthisprobabilitydistribution. SVCcandealwithoutliersbyemployingasoftmarginconstantthatallowsthesphere infeaturespacenottoencloseallpoints. ,orlargevaluesofthisparameter,wecanalsodeal withoverlappingclusters. 5nthisrangeouralgorithmissimilartothescalespaceclustering method of 4oberts (())3+ that is based on a ;arzen window estimate of the probability densitywithaGaussiankernelfunction. 5nthenextSectionwedeﬁnetheSVCalgorithm. 5nSection3itisappliedtoproblems withandwithoutoutliers. Weﬁrstdescribeaproblemwithoutoutlierstoillustratethetype ofclusteringboundariesandclusteringsolutionsthatareobtainedbyvaryingthescaleofthe Gaussian kernel. Then we proceed to discuss problems that necessitate invoking outliers in order to obtain smooth clustering boundaries. These problems include two standard benchmarkexamples. 2. TheSVCAlgorithm 2.1Cluster Boundaries ,ollowingSch olkopfetal.(.---+andTaxandDuin(()))+weformulateasupportvector descriptionofadataset,thatisusedasthebasisofourclusteringalgorithm. =et } beadatasetof points,with 54 ,thedataspace. Usinganonlineartransformation Φfrom tosomehighdimensionalfeature-space,welookforthesmallestenclosingsphere ofradius . Thisisdescribedbytheconstraints: || Φ( || j, where |||| is the Auclidean norm and is the center of the sphere. Soft constraints are incorporatedbyaddingslackvariables || Φ( || ((+ (.0

Page 3

SupportVectorClustering with -. Tosolvethisproblemweintroducethe=agrangian −|| Φ( || (.+ where -and -are=agrangemultipliers, isaconstant,and isapenalty term. Settingtozerothederivativeof withrespectto and ,respectively,leadsto C( (3+ Φ( + (D+ (1+ The&&Tcomplementarityconditionsof,letcher(()83+resultin C- (0+ −|| Φ( || C- (3+ 5t follows from AE. (3+ that the image of a point with - and - lies outside thefeature-spacesphere. AE. (0+statesthatsuchapointhas C-, henceweconclude from AE. (1+ that . This willbe calleda bounded support vector orBSV. 8 point with C-ismappedtotheinsideortothesurfaceofthefeaturespacesphere. 5fits < then AE. (3+ implies that its image Φ( + lies on the surface of the feature spacesphere. Suchapointwillbereferredtoasa support vector orSV.SVslieoncluster boundaries,BSVslieoutsidetheboundaries,andallotherpointslieinsidethem. Notethat when (noBSVsexistbecauseoftheconstraint(3+. Usingtheserelationswemayeliminatethevariables and ,turningthe=agrangian intotheWolfedualformthatisafunctionofthevariables Φ( i,j Φ( Φ( (8+ Sincethevariables donFtappearinthe=agrangiantheymaybereplacedwiththecon- straints: C,j C( ,...,N. ()+ WefollowtheSVmethodandrepresentthedotproductsΦ( Φ( +byanappropriate Mercerkernel +. ThroughoutthispaperweusetheGaussiankernel +C || || ((-+ withwidthparameter . 8snotedinTaxandDuin(()))+,polynomialkernelsdonotyield tightcontoursrepresentationsofacluster. The=agrangian isnowwrittenas: i,j (((+ (.3

Page 4

Ben-Hur,Horn,SiegelmannandVapnik 8teachpoint wedeﬁnethedistanceofitsimageinfeaturespacefromthecenterof thesphere: +C || Φ( || ((.+ 5nviewof(D+andthedeﬁnitionofthekernelwehave: +C +B i,j ((3+ Theradiusofthesphereis: isasupportvector ((D+ Thecontoursthatenclosethepointsindataspacearedeﬁnedbytheset +C ((1+ They are interpreted by us as forming cluster boundaries (see ,igures ( and 3+. 5n view ofeEuation((D+,SVslieonclusterboundaries,BSVsareoutside,andallotherpointslie insidetheclusters. 2.2 Cluster Assignment Theclusterdescriptionalgorithmdoesnotdiﬀerentiatebetweenpointsthatbelongtodiﬀer- entclusters. Todoso,weuseageometricapproachinvolving +,basedonthefollowing observation: givenapairofdatapointsthatbelongtodiﬀerentcomponents(clusters+,any paththatconnectsthemmustexitfromthesphereinfeaturespace. Therefore,suchapath contains a segment of points such that >R . This leads to the deﬁnition of the adHacencymatrix ij betweenpairsofpoints and whoseimageslieinoronthesphere infeaturespace: ij ( if,forall onthelinesegmentconnecting and ,R - otherwise ((0+ Clustersarenowdeﬁnedastheconnectedcomponentsofthegraphinducedby . Checking the line segment is implemented by sampling a number of points (.- points were used in ournumericalexperiments+. BSVsareunclassiﬁedbythisproceduresincetheirfeaturespaceimageslieoutsidethe enclosing sphere. 2ne may decide either to leave them unclassiﬁed, or to assign them to theclusterthattheyareclosestto,aswewilldointheexamplesstudiedbelow. 3. Examples The shape of the enclosing contours in data space is governed by two parameters: , the scaleparameteroftheGaussiankernel,and ,thesoftmarginconstant. 5ntheexamples studiedinthissectionwewilldemonstratetheeﬀectsofthesetwoparameters. (.8

Page 5

SupportVectorClustering −1 −0.5 0.5 −1 −0.5 0.5 (a) −1 −0.5 0.5 −1 −0.5 0.5 (b) −1 −0.5 0.5 −1 −0.5 0.5 (c) −1 −0.5 0.5 −1 −0.5 0.5 (d) ,igure(: Clusteringofadatasetcontaining(83pointsusingSVCwith C(. Support vectorsaredesignated bysmallcircles, andclusterassignments arerepresented bydiﬀerentgreyscalesofthedatapoints. (a+: C((b+: C.-(c+: C.D(d+: CD8. 3.1Example without BSVs We begin with a data set in which the separation into clusters can be achieved without invoking outliers, i.e. C (. ,igure ( demonstrates that as the scale parameter of the Gaussian kernel, , is increased, the shape of the boundary in data-space varies: with increasing theboundaryﬁtsmoretightlythedata,andatseveral valuestheenclosing contoursplits, forminganincreasingnumberofcomponents(clusters+. ,igure(ahasthe smoothestclusterboundary,deﬁnedbysixSVs. Withincreasing ,thenumberofsupport vectors sv increases. Thisisdemonstratedin,igure.whereweplot sv asafunctionof forthedataconsideredin,igure(. 3.2 Example with BSVs 5n real data, clusters are usually not as well separated as in ,igure (. Thus, in order to observesplittingofcontours,wemustallowforBSVs. Thenumberofoutliersiscontrolled (.)

Page 6

Ben-Hur,Horn,SiegelmannandVapnik 10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 support vectors ,igure.: Number of SVs as a function of for the data of ,igure (. Contour splitting pointsaredenotedbyverticallines. bytheparameter . ,romtheconstraints(3,)+itfollowsthat bsv /C, ((3+ where bsv is the number of BSVs. Thus ( NC + is an upper bound on the fraction of BSVs,anditismorenaturaltoworkwiththeparameter NC ((8+ 8symptotically(forlarge +,thefractionofoutlierstendsto ,asnotedinSch olkopfetal. (.---+. Whendistinctclustersarepresent,butsomeoutliers(e.g. duetonoise+preventcontour separation,itisveryusefultoemployBSVs. Thisisdemonstratedin,igure3a: without BSVscontourseparationdoesnotoccurforthetwoouterringsforanyvalueof . When someBSVsarepresent,theclustersareseparatedeasily(,igure3b+. Thediﬀerencebetween datathatarecontour-separablewithoutBSVsanddatathatreEuireuseofBSVsisillus- tratedschematicallyin,igureD. 8smalloverlapbetweenthetwoprobabilitydistributions thatgeneratethedataisenoughtopreventseparationiftherearenoBSVs. 5n the spirit of the examples displayed in ,igures ( and 3 we propose to use SVC iteratively: Starting with a low value of where there is a single cluster, and increasing it, to observe the formation of an increasing number of clusters, as the Gaussian kernel describesthedata withincreasingprecision. 5f, however, thenumber ofSVsisexcessive, (3-

Page 7

SupportVectorClustering (a) (b) ,igure3: Clusteringwith andwithout BSVs. Theinnerclusteriscomposedof1-points generatedfromaGaussiandistribution. Thetwoconcentricringscontain(1-I3-- points, generated from a uniform angular distribution and radial Gaussian dis- tribution. (a+ The rings cannot be distinguished when C (. Shown here is C3 1, the lowest value that leads to separation of the inner cluster. (b+ 2utliersalloweasyclustering. Theparametersare C- 3and C( -. 0.05 0.1 0.15 0.2 (a) 0.05 0.1 0.15 0.2 BSVs BSVs BSVs (b) ,igureD: ClusterswithoverlappingdensityfunctionsreEuiretheintroductionofBSVs. i.e. alargefractionofthedataturnsintoSVs(,igure3a+,oranumberofsingletonclusters form, one should increase to allow these points to turn into outliers, thus facilitating contour separation (,igure 3b+. 8s is increased not only does the number of BSVs increase, but their inﬂuence on the shape of the cluster contour decreases, as shown in Ben-9uretal.(.---+. Thenumberofsupportvectorsdependsonboth and . ,orﬁxed (3(

Page 8

Ben-Hur,Horn,SiegelmannandVapnik ,as isincreased,thenumberofSVsdecreasessincesomeofthemturnintoBSVsand thecontoursbecomesmoother(see,igure3+. 4. StronglyOverlappingClusters 2uralgorithmmayalsobeusefulincaseswhereclustersstronglyoverlap,howeveradiﬀerent interpretationoftheresultsisreEuired. WeproposetouseinsuchacaseahighBSVregime, and reinterpret the sphere in feature space as representing cluster cores, rather than the envelopeofalldata. NotethateEuation((1+forthereﬂectionofthesphereindataspacecanbeexpressed as +C (()+ where isdeterminedbythevalueofthissumonthesupportvectors. Thesetofpoints enclosedbythecontouris: > (.-+ 5ntheextremecasewhenalmostalldatapointsareBSVs( (+,thesuminthisexpres- sion, svc + (.(+ isapproximatelyeEualto (..+ Thislastexpressionisrecognizedasa;arzenwindowestimateofthedensityfunction(up to a normalization factor, if the kernel is not appropriately normalized+, see Duda et al. (.--(+. 5n this high BSV regime, we expect the contour in data space to enclose a small number of points which lie near the maximum of the ;arzen-estimated density. 5n other words,thecontourspeciﬁesthe core oftheprobabilitydistribution. Thisisschematically representedin,igure1. 5n this regime our algorithm is closely related to the scale-space algorithm proposed by4oberts(())3+. 9edeﬁnesclustercentersasmaximaofthe;arzenwindowestimator +. The Gaussian kernel plays an important role in his analysis: it is the only kernel for which the number of maxima (hence the number of clusters+ is a monotonically non- decreasingfunctionof . ThisisthecounterpartofcontoursplittinginSVC.8sanexample westudythecrabdatasetof4ipley(())0+in,igure0. Weplotthetopographicmapsof and svc inthehighBSVregime. Thetwomapsareverysimilar. 5n,igure0awepresent theSVCclusteringassignment. ,igure0bshowstheoriginalclassiﬁcationsuperimposedon thetopographicmapof . 5nthescalespaceclusteringapproachitisdiﬃculttoidentify the bottom right cluster, since there is only a small region that attracts points to this localmaximum. Weproposetoﬁrstidentifythecontoursthatformclustercores,thedark contoursin,igure0a,andthenassociatepoints(includingBSVs+toclustersaccordingto theirdistancesfromclustercores. (3.

Page 9

SupportVectorClustering 0.05 0.1 0.15 0.2 0.25 BSVs BSVs BSVs core core ,igure1: 5nthecaseofsigniﬁcantoverlapbetweenclustersthealgorithmidentiﬁesclusters accordingtodensecores,ormaximaoftheunderlyingprobabilitydistribution. (a) PC2 PC3 (b) PC2 PC3 ,igure0: 4ipleyFs crab data displayed on a plot of their .nd and 3rd principal compo- nents: (a+ Topographic map of svc + and SVC cluster assignments. Cluster coreboundariesaredenotedbyboldcontoursLparameterswere CD ,p C- 3. (b+ The ;arzen window topographic map + for the same value, and the datarepresentedbytheoriginalclassiﬁcationgivenby4ipley(())0+. ThecomputationaladvantageofSVCover4obertsFmethodisthat,insteadofsolving aproblemwithmanylocalmaxima,weidentifycoreboundariesbyanSVmethodwitha globaloptimalsolution. Theconceptualadvantageofourmethodisthatwedeﬁnearegion, ratherthanHustapeak,asthecoreofthecluster. (33

Page 10

Ben-Hur,Horn,SiegelmannandVapnik 1.5 0.5 0.5 1.5 ,igure3: Cluster boundaries of the iris data set analyzed in a two-dimensional space spannedbytheﬁrsttwoprincipalcomponents. ;arametersusedare C0 0. 4.1The Iris Data We ran SVC on the iris data set of ,isher (()30+, which is a standard benchmark in the patternrecognitionliterature,andcanbeobtainedfromBlakeandMerz(())8+. Thedata set contains (1- instances each composed of four measurements of an iris ﬂower. There are three types of ﬂowers, represented by 1- instances each. Clustering of this data in thespaceofitsﬁrsttwoprincipalcomponentsisdepictedin,igure3(datawascentered prior to extraction of principal components+. 2ne of the clusters is linearly separable from the other two by a clear gap in the probability distribution. The remaining two clustershavesigniﬁcantoverlap, andwereseparatedat C0 C- 0. 9owever, atthese values of the parameters, the third cluster split into two (see ,igure 3+. When these two clustersareconsideredtogether,theresultis.misclassiﬁcations. 8ddingthethirdprincipal componentweobtainedthethreeclustersat C3 C- 3-,withfourmisclassiﬁcations. Withthefourthprincipalcomponentthenumberofmisclassiﬁcationsincreasedto(D(using C) C- 31+. 5n addition, the number of support vectors increased with increasing dimensionality ((8 in . dimensions, .3 in 3 dimensions and 3D in D dimensions+. The improvedperformancein.or3dimensionscanbeattributedtothenoisereductioneﬀect of ;C8. 2ur results compare favorably with other non-parametric clustering algorithms: theinformationtheoreticapproachofTishbyandSlonim(.--(+leadsto1misclassiﬁcations andtheS;CalgorithmofBlattetal.(())3+, whenappliedtothedatasetintheoriginal data-space,has(1misclassiﬁcations. ,orhighdimensionaldatasets,e.g. the5soletdataset which has 0(3 dimensions, the problem was obtaining a support vector description: the number of support vectors Humped from very few (one cluster+ to all data points being supportvectors(everypointinaseparatecluster+. Using;C8toreducethedimensionality produceddatathatclusteredwell. (3D

Page 11

SupportVectorClustering 4.2 Varying and We propose to use SVC as a MdivisiveN clustering algorithm, see Jain and Dubes (()88+: startingfromasmallvalueof andincreasingit. Theinitialvalueof maybechosenas max i,j || || (.3+ 8tthisscaleallpairsofpointsproduceasizeablekernelvalue,resultinginasinglecluster. 8tthisvaluenooutliersareneeded,hencewechoose C(. 8s is increased we expect to ﬁnd bifurcations of clusters. 8lthough this may look as hierarchical clustering, we have found counterexamples when using BSVs. Thus strict hierarchyisnotguaranteed,unlessthealgorithmisappliedseparatelytoeachclusterrather than to the whole dataset. We do not pursue this choice here, in order to show how the cluster structure is unraveled as is increased. Starting out with C( /N ,or C(, we do not allow for any outliers. 5f, as is being increased, clusters of single or few points break oﬀ, or cluster boundaries become very rough (as in ,igure 3a+, should be increasedinordertoinvestigatewhathappenswhenBSVsareallowed. 5ngeneral,agood criterionseemstobethenumberofSVs: alownumberguaranteessmoothboundaries. 8s increasesthisnumberincreases,asin,igure.. 5fthenumberofSVsisexcessive, should beincreased, wherebymanySVsmaybeturnedintoBSVs, andsmoothcluster(orcore+ boundariesemerge,asin,igure3b. 5notherwords,weproposetosystematicallyincrease and alongadirectionthatguaranteesaminimalnumberofSVs. 8secondcriterionfor goodclusteringsolutionsisthestabilityofclusterassignmentsoversomerangeofthetwo parameters. 8n important issue in the divisive approach is the decision when to stop dividing the clusters. Manyapproachestothisproblemexist,suchasMilliganandCooper(()81+,Ben- 9ur et al. (.--.+ (and references therein+. 9owever, we believe that in our SV setting it isnaturaltousethenumberofsupportvectorsasanindicationofameaningfulsolution, as described above. 9ence we should stop SVC when the fraction of SVs exceeds some threshold. 5. Complexity TheEuadraticprogrammingproblemofeEuation(.+canbesolvedbytheSM2algorithmof ;latt(()))+whichwasproposedasaneﬃcienttoolforSVMtraininginthesupervisedcase. Some minor modiﬁcations are reEuired to adapt it to the unsupervised training problem addressedhere,seeSch olkopfetal.(.---+. Benchmarksreportedin;latt(()))+showthat thisalgorithmconvergesafterapproximately +kernelevaluations. Thecomplexityof thelabelingpartofthealgorithmis (( bsv sv +,sothattheoverallcomplexityis +ifthenumberofsupportvectorsis ((+. Weuseaheuristictolowerthisestimate: wedonotcomputethewholeadHacencymatrix,butonlyadHacencieswithsupportvectors. This gave the same results on the data sets we have tried, and lowers the complexity to (( bsv sv +. WealsonotethatthememoryreEuirementsoftheSM2algorithmare low: itcanbeimplementedusing ((+memoryatthecostofadecreaseineﬃciency. This makesSVCusefulevenforverylargedatasets. (31

Page 12

Ben-Hur,Horn,SiegelmannandVapnik 6. Discussion We have proposed a novel clustering method, SVC, based on the SVM formalism. 2ur method has no explicit bias of either the number, or the shape of clusters. 5t has two parameters, allowing it to obtain various clustering solutions. The parameter of the Gaussian kernel determines the scale at which the data is probed, and as it is increased clusters begin to split. The other parameter, , is the soft margin constant that controls thenumberofoutliers. Thisparameterenablesanalyzingnoisydatapointsandseparating between overlapping clusters. This is in contrast with most clustering algorithms found in the literature, that have no mechanism for dealing with noise or outliers. 9owever we notethatforclusteringinstanceswithstronglyoverlappingclustersSVCcandelineateonly relatively small cluster cores. 8n alternative for overlapping clusters is to use a support vectordescriptionforeachcluster. ;reliminaryresultsinthisdirectionarefoundinBen- 9uretal.(.---+. 8uniEueadvantageofouralgorithmisthatitcangenerateclusterboundariesofarbi- traryshape, whereasotheralgorithmsthatuseageometricrepresentationaremostoften limitedtohyper-ellipsoids,seeJainandDubes(()88+. 5nthisrespectSVCisreminiscent of the method of =ipson and Siegelmann (.---+ where high order neurons deﬁne a high dimensional feature-space. 2ur algorithm has a distinct advantage over the latter: being based on a kernel method it avoids explicit calculations in the high-dimensional feature space,andhenceismoreeﬃcient. 5n the high regime SVC becomes similar to the scale-space approach that probes theclusterstructureusingaGaussian;arzenwindowestimateoftheprobabilitydensity, whereclustercentersaredeﬁnedbythelocalmaximaofthedensity. 2urmethodhasthe computationaladvantageofrelyingontheSVMEuadraticoptimizationthathasoneglobal solution. References 8.Ben-9ur,8.Alisseeﬀ,and5.Guyon. 8stabilitybasedmethodfordiscoveringstructure inclustereddata. in;aciﬁcSymposiumonBiocomputing,.--.. 8.Ben-9ur,D.9orn,9.T.Siegelmann,andV.Vapnik.8supportvectorclusteringmethod. in5nternationalConferenceon;attern4ecognition,.---. 8.Ben-9ur,D.9orn,9.T.Siegelmann,andV.Vapnik.8supportvectorclusteringmethod. in8dvancesinNeural5nformation;rocessingSystems(3: ;roceedingsofthe.---Con- ference,Todd&.=een,ThomasG.DietterichandVolkerTrespeds.,.--(. C.=.BlakeandC.J.Merz. Ucirepositoryofmachinelearningdatabases,())8. MarceloBlatt,ShaiWiseman,andAytanDomany. Dataclusteringusingamodelgranular magnet. Neural Computation ,)(8+:(8-1O(8D.,())3. 4.2.Duda, ;.A.9art, andD.G.Stork. Pattern Classiﬁcation . JohnWileyPSons, New Qork,.--(. (30

Page 13

SupportVectorClustering 4.8.,isher.Theuseofmultiplemeasurmentsintaxonomicproblems. Annals of Eugenics 3:(3)O(88,()30. 4.,letcher. Practical Methods of Optimization . Wiley-5nterscience,Chichester,()83. &.,ukunaga. Introduction to Statistical Pattern Reco gnition . 8cademic;ress,SanDiego, C8,())-. 8.&.Jainand4.C.Dubes. Algorithms for clustering data .;rentice9all,AnglewoodCliﬀs, NJ,()88. 9. =ipson and 9.T. Siegelmann. Clustering irregular shapes using high-order neurons. Neural Computation ,(.:.33(O.313,.---. J.Mac/ueen. Somemethodsforclassiﬁcationandanalysisofmultivariateobservations. in ;roc.1thBerkeleySymposiumonMathematicalStatisticsand;robability,Vol.(,()01. G.W.MilliganandM.C.Cooper.8nexaminationofproceduresfordeterminingthenumber ofclustersinadataset. Psychometrika ,1-:(1)O(3),()81. J.;latt.,asttrainingofsupportvectormachinesusingseEuentialminimaloptimization.in 8dvancesin&ernelMethodsRSupportVector=earning,B.Sch olkopf,C.J.C.Burges, and8.J.Smola,editors,())). B.D.4ipley. Pattern reco gnition and neural networks . CambridgeUniversity;ress,Cam- bridge,())0. S.J. 4oberts. Non-parametric unsupervised cluster analysis. Pattern Reco gnition , 3-(.+: .0(O.3.,())3. B.Sch olkopf,4.C.Williamson,8.J.Smola,J.Shawe-Taylor,andJ.;latt. Supportvector method for novelty detection. in 8dvances in Neural 5nformation ;rocessing Systems (.: ;roceedingsofthe()))Conference,Sara8.Solla,Todd&.=eenand&laus-4obert Mullereds.,.---. Bernhard Sch olkopf, John C. ;latt, John Shawe-Taylor, , 8lex J. Smola, and 4obert C. Williamson. Astimatingthesupportofahigh-dimensionaldistribution. Neural Compu- tation ,(3:(DD3O(D3(,.--(. 4.Shamirand4.Sharan. 8lgorithmicapproachestoclusteringgeneexpressiondata. in T.Jiang,T.Smith,Q.Su,andM./.Thang,editors,CurrentTopicsinComputational Biology,.---. D.M.J. Tax and 4.;.W. Duin. Support vector domain description. Pattern Reco gnition Letters ,.-:())(O())),())). N. Tishby and N. Slonim. Data clustering by Markovian relaxation and the information bottleneckmethod. in8dvancesinNeural5nformation;rocessingSystems(3: ;roceed- ingsofthe.---Conference,Todd&.=een,ThomasG.DietterichandVolkerTrespeds., .--(. V.Vapnik. The Nature of Statistical Learning Theory . Springer,NewQork,())1. (33