/
Clutter Reduction in MultiDimensional Data Visualization Using Dimension Reordering Wei Clutter Reduction in MultiDimensional Data Visualization Using Dimension Reordering Wei

Clutter Reduction in MultiDimensional Data Visualization Using Dimension Reordering Wei - PDF document

karlyn-bohler
karlyn-bohler . @karlyn-bohler
Follow
415 views
Uploaded On 2015-03-07

Clutter Reduction in MultiDimensional Data Visualization Using Dimension Reordering Wei - PPT Presentation

Ward and Elke A Rundensteiner Computer Science Department Worcester Polytechnic Institute Worcester MA 01609 debbiemattrundenst cswpiedu BSTRACT Clutter denotes a disordered collection of graphical entities in in formation visualization Clutter can ID: 42184

Ward and Elke

Share:

Link:

Embed:

Download Presentation from below link

Download Pdf The PPT/PDF document "Clutter Reduction in MultiDimensional Da..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

ClutterReductioninMulti-DimensionalDataVisualizationUsingDimensionReordering¤WeiPeng,MatthewO.WardandElkeA.RundensteinerComputerScienceDepartmentWorcesterPolytechnicInstituteWorcester,MA01609fdebbie,matt,rundenstg@cs.wpi.eduABSTRACTClutterdenotesadisorderedcollectionofgraphicalentitiesinin-formationvisualization.Cluttercanobscurethestructurepresentinthedata.Eveninasmalldataset,cluttercanmakeithardfortheviewertondpatternsandrevealrelationships.Inthispaper,wepresenttheconceptofclutter-baseddimensionreordering.Ourhopeistoreduceclutterwithoutreducinginformationcontentordisturbdatainanyway.Dimensionorderisavariablethatcansignicantlyaffectavisualization'sexpressiveness.Byvaryingthedimensionorderinvisualizations,ourgoalistondtheviewswiththeleastamountofvisualclutter.Clutterreductionisadisplay-dependenttask.Wedenedifferentmeasuresofwhatconstitutesclutterintermsofdisplaypropertiesforfourdifferentvisualizationtechniques.Wethenapplydimensionorderingalgorithmstosearchforaorderthatminimizestheclutterinadisplay.CRCategories:H.5.2[InformationInterfacesandPresenta-tion]:UserInterfaces—GraphicaluserinterfacesH.2.8[DatabaseManagement]:DatabaseApplications—DataminingI.5.3[PatternRecognition]:Clustering—SimilarityMeasuresKeywords:multidimensionalvisualization,dimensionordering,visualclutter,visualstructure1INTRODUCTIONVisualizationisthegraphicalpresentationofinformation,withthegoaloffacilitatingtheusertogainaqualitativeunderstandingoftheinformation.Agoodvisualizationclearlyrevealsstructurewithinthedataandthuscanhelptheviewertobetteridentifypatternsanddetectoutliers.Clutter,ontheotherhand,ischaracterizedbycrowdedanddisorderedvisualentitiesthatobscurethestructureinvisualdisplays.Clutteriscertainlyundesirablesinceithindersviewers'understandingofthedisplayscontent.However,whenthedimensionsornumberofdataitemsgrowhigh,itisinevitableforuserstoencounterclutter,nomatterwhatvisualmethodisused.Manyclutterreductiontechniquesdealwithdataofhighvolumeorhighdimensionality,suchashierarchicalclustering,sampling,andltering.Buttheymayresultinsomeinformationloss.Dis-tortionisanothercategoryofmethodsforclutterreduction.Butdistortedviewsdonotgiveanunbiasedrepresentationofthedatacontentbecausespatialrelationshipsaremodied.Inordertocom-plementtheseapproaches,helpingtheusertoreduceclutterinsometraditionalvisualizationtechniqueswhileretainingtheinformationinthedisplay,weproposeaclutterreductiontechniqueusingdi-mensionreordering. ¤ThisworkwassupportedunderNSFgrantIIS-0119276.Inmanymultivariatevisualizationtechniques,suchasparallelcoordinates[6],glyphs[14],scatterplotmatrices[1]andpixel-orientedmethods[9],dimensionsarepositionedinsomeone-ortwo-dimensionalarrangementonthescreen[24].Giventhe2-Dnatureofthismedium,thearrangementmustchoosesomeorderofdimension.Thisarrangementcanhaveamajorimpactontheexpressivenessofthevisualization.Differentordersofdimensionscanrevealdifferentaspectsofthedataandaffecttheperceivedclut-terandstructureinthedisplay.Thuscompletelydifferentconclu-sionscanbedrawnbasedontheavailabledisplay.Unfortunately,inmanyexistingvisualizationsystemsthatencompassthesetech-niques,dimensionsareusuallyorderedwithoutmuchcare.Infact,dimensionsareoftendeterminedbythedefaultorderintheoriginaldataset.Manualdimensionorderingisavailableinsomesystems.Forexample,Polaris[18]allowsuserstomanuallyselectandorderthedimensionstobemappedtothedisplay.Similarly,inXmdvTool[24],userscanmanuallychangetheorderofdimensionsfromare-congurablelistofdimensions.However,theexhaustivesearchforbestorderingistediousevenforamodestnumberofdimensions.Atthesametimeitlacksaquantitativemeasurementofquality.Therefore,automaticclutter-baseddimensionorderingtechniqueswouldremedythisshortcomingofcurrenttools.Clutterreductionisavisualization-dependenttaskbecausevisu-alizationtechniquesvarylargelyfromonetoanother.Thebasicgoalofthispaperistopresentcluttermeasuringandreductionap-proachesforseveralvisualizationtechniques,namelyparallelco-ordinates[6],scatterplotmatrices[1],starglyphs[17],anddimen-sionalstacking[12].Inordertoautomatethedimensionreorderingprocessforadis-play,weareconcernedwithtwoissues:(1)designingametrictomeasurevisualclutter,and(2)developinganalgorithmtoreorderthedimensionsforthepurposeofclutterreduction.Thesolutionsweprovidemustbespecicallytunedtoeachindividualvisualiza-tiontechnique.Insometechniques,wewanttoreducethelevelofnoisethattendstoobscurethestructureinthedisplay;inothercaseswewanttoincreasethenumberofclusters.Insomecasesweevenwanttodoboth.Foreachtechnique,wewillstudybothis-sues.First,wewillcarefullydeneametricformeasuringclutter.Secondwewillchooseonealgorithmfromthepossiblesolutioncandidatestoarrangethedimensions.Third,wewillcomparetheresultswiththeoriginaldisplay.Ourtechniquetargetsonsmalltomiddle-sizedatasetintermsofdimensionality.Althoughweonlychosefourvisualizationtech-niquestoexperimentwith,therearemanymoretraditionalvisual-izationtechniquescanbenitfromthisconcept.Theremainderofthispaperisorganizedasfollows.Section2willprovideareviewofrelatedwork.Sections3,4,5,and6discusstheclutterdenitionsandmeasuresforfourdifferentvisualizationtechniquesrespectively.InSection7,algorithmsforreorderingarepresented.ConclusionsandfutureworkarepresentedinSection8. 2RELATEDWORKToovercometheclutterproblem,manyapproacheshavebeenpro-posed.Distortion[16,13]isawidelyusedtechniqueusedforclutterreduction.Invisualizationssupportingdistortion-orientedtechniques,theinterestingportionofthedataisgivenmoredisplayspace.Theproblemwiththistechniqueisthattheuninterestingsubsetofthedataissqueezedintoasmallarea,makingitdifcultfortheviewertofullyunderstandit.Multi-resolutionapproaches[23,22,5]areusedtogroupthedataintohierarchicalclustersanddisplaythematadesiredlevelofdetail.Theseapproachesdonotretainalltheinformationinthedata,sincemanydetailswillbelteredoutatlowresolutions.Highdimensionalityisanothersourceofclutter.Manyap-proachesexistfordimensionreduction.PrincipalComponentAnalysis[8],Multi-dimensionalScaling[11,21],andSelfOrga-nizingMaps[10]arepopulardimensionalityreductiontechniquesusedindataandinformationvisualization.Yangetal.[26]pro-posedavisualhierarchicaldimensionreductiontechniquethatcre-atesmeaningfullowerdimensionalspaceswithrepresentativedi-mensionsfromoriginaldatainsteadoffromgeneratednewdimen-sions.Thesetechniquesgeneratealowerdimensionalsubspacetoreduceclutterbutsomeinformationintheoriginaldataspaceisalsolost.Alltheseapproachesunavoidablycauseinformationloss,inonewayoranother.Dimensionorderinginvisualizationhasbeenstudiedin[2,25].Ankerstetal.[2]proposedamethodtoarrangedatadimensionsaccordingtothesimilaritybetweendimensionssothatsimilaronesareputnexttoeachother.TheyusedEuclideandistanceasthesimilaritymeasure,provedthatthearrangementproblemisNP-complete,andappliedheuristicalgorithmstosearchfortheopti-malorder.Yangetal.[25]imposedahierarchicalstructureoverthedimensionsthemselves,groupingalargenumberofdimensionsintohierarchiessothatthecomplexityoftheorderingproblemisreduced.Userinteractionsarethensupportedtomakeitpracticalforuserstoactivelydecideondimensionreductionandorderinginthevisualizationprocess.However,inthoseapproaches,dimen-sionsarereorderedaccordingtoonlyoneparticularmeasure,thesimilaritybetweendimensions.Inmanyvisualizationtechniques,theoverallclutterinthedisplayisnotalwaysrelatedtosimilar-itybetweendimensions.Avisualizationwiththebestcorrelateddimensionsdoesnotguaranteetheleastclutter.Buttheideaofus-ingdimensionorderingtoimproveclusteringcertainlyinspiredourworkintotheresearchofdimensionorderingtoimprovevisualiza-tionquality.3PARALLELCOORDINATESParallelcoordinatesisatechniquepioneeredinthe1980'sthathasbeenappliedtoadiversesetofmultidimensionalanalysisproblems[6].Inthismethod,eachdimensioncorrespondstoanaxis,andtheNaxesareorganizedasuniformlyspacedverticalorhorizontallines.AdataelementinanN-dimensionalspacemanifestsitselfasaconnectedsetofpoints,oneoneachaxis.Thusapolylineisgeneratedforrepresentingonedatapoint.3.1ClutterAnalysisofParallelCoordinatesIntheparallelcoordinatesdisplay,astheaxesorderischanged,thepolylinesrepresentingdatapointscanbeshownwithverydis-tinctshapes.InFigures1and2,thetwodisplaysillustratethesamedatasetwithdifferentdimensionorders.Ascanbeseeninthegure,aparallelcoordinatesdisplaymakesinter-dimensionalrelationshipsbetweenneighboringdimensionsveryeasytosee,butdoesnotatalldiscloserelationshipsbetweennon-adjacentdimen-sions.Inafulldisplayofparallelcoordinateswithoutsampling,lteringormulti-resolutionprocessing,ifpolylinesbetweentwodimensionscanbenaturallygroupedintoasetofclusters,theuserwilllikelynditeasiertocomprehendtherelationshipbetweenthem.Instead,iftherearemanypolylinesthatdon'tbelongtoanycluster,thespacebetweenthetwodimensionscanbeverycluttered.Thesepolylinesdon'thelptheviewertondpatternsanddiscoverrelationships.Thesedatapointsthatdon'tbelongtoanyclusterarecalledoutliers.Andwewouldwanttominimizetheirimpactinthedisplay.3.2ClutterMeasureinParallelCoordinates3.2.1ClutterDenitionDuetothefactthatoutliersoftenobscurestructureandthusconfusetheuser,clutterinparallelcoordinatescanbedenedasthepropor-tionofoutliersagainstthetotalnumberofdatapoints.Toreduceclutterinthistechnique,ourtaskistorearrangethedimensionstominimizetheclutterbetweenneighboringdimensions.Tocalculatethescoreforagivendimensionorder,werstcountthetotalnum-berofoutliersbetweenneighboringdimensions,Soutlier.Iftherearendimensions,thenumberofneighboringpairsforagivenorderisn¡1.TheaverageoutliernumberbetweendimensionsisdenedtobeSavg=Soutlier=(n¡1).LetStotaldenotethetotalnumberofdatapoints.TheclutterC,denedastheproportionofoutliers,canthenbecalculatedasfollows:C=Savg=Stotal=Soutlier n¡1 Stotal(1)Sincen¡1andStotalarebothconstant,dimensionordersthatreducethetotalnumberofoutliersalsoreduceclutterinthedisplay.Inthisway,wecanmeasuretheclutterinthedisplayandthenndthebestorder.3.2.2AlgorithmforComputingClutterNowwearefacedwiththeproblemofhowtodecideifadataitemiswithinaclusterorisanoutlier.Sinceweareonlycon-cernedwithclusterswithinpairsofdimensions,wecanusethenor-malizedEuclideandistancesbetweendatapointstomeasuretheircloseness.Thetwo-dimensionalclusteringproblemhasbeendis-cussedintenselyinthestatistics,patternrecognitionanddatamin-ingcommunities.Jain'sbook[7]givesathoroughdescriptionofclusteringalgorithms.Sinceourpurposeistondoutliersthatdonothaveanyneighborsclosetothem,wedecidedtochooseLuandFu[15]'snearest-neighborclusteringalgorithm.SupposeasetofdatapointsP=fx1;x2;:::;xngistobepartitionedintoclusters.Letkdenotetheclusternumber.Theuserspeciesathreshold,,onthenearest-neighbordistance.Thealgorithmcanbedescribedasfollows:Step1.SetÃ1andkÃ1.Takex1fromP.Assigndatapointx1toclusterC1.Step2.SetÃ+1.Ifxihasnotbeenassignedtoanycluster,ndthenearestneighborofxiamongthedatapointsalreadyassignedtoclusters.Supposethatthenearestneighborisinclusterm.Letdmdenotethedistancefromxitothisneighbor.Step3.Ifdm·,thenassignxitoCm.Otherwise,setkÃk+1andassignxitoanewclusterC.Step4.Ifeverypointhasbeenassignedtoacluster,stop.Else,gotostep2.Ifaclustercontainsonlyonedatapoint,itisthencalledanoutlier.Inthisway,weareabletondallthedatapointsthat Figure1:ParallelcoordinatesvisualizationoforiginalCarsdataset.OutliersarehighlightedwithredinFig.1-(b) Figure2:Parallelcoordinatesvisualizationofcarsdatasetafterclutter-baseddimensionreordering.OutliersarehighlightedwithredinFig.2-(b)don'thaveanyneighborswithinthedistanceinthespeciedtwo-dimensionalspace.WedothisforeverypairofdimensionsandstoretheoutliernumbersinaoutliermatrixM.Givenadimensionorder,wecanthencountthetotalclutterbyaddingupoutliernum-bersbetweenneighboringdimensions.Ifthedimensionnumberisn,thisisdoneinO(n)time.SincetheoptimaldimensionorderingalgorithmisanexhaustivesearchalgorithmwithO(n!)time,thesearchtimeinvolvedisthereforeO(n¤n!).3.3ExamplesFigures1and2bothrepresenttheCarsdataset.InFigure1thedataisdisplayedwiththedefaultdimensionordering.Figure2displaysthedataafterbeingprocessedwithclutter-basedordering.Intherightmostimageineachgure,polylineshighlightedinredareout-liersaccordingtoourcluttermetric.Withaglimpsewecanidentifymoreoutliersintheoriginalvisualizationthantheimprovedone.Itisalsoclearthat,inthenewdisplay,thedatapointsarebettersepa-ratedandeasierfortheviewertondpatterns.4SCATTERPLOTMATRICESScatterplotmatricesareoneoftheoldestandmostcommonlyusedmethodstoprojecthighdimensionaldatato2-dimensions[1].Inthismethod,N¤(N¡1)=2pairwiseparallelprojectionsaregener-ated,eachgivingtheviewerageneralimpressionregardingrela-tionshipswithinthedatabetweenpairsofdimensions.Theprojec-tionsarearrangedinagridstructuretohelptheuserrememberthedimensionsassociatedwitheachprojection.4.1ClutterAnalysisinScatterplotMatricesInclutterreductionforscatterplotmatrices,wefocusonndingstructureinplotsratherthanoutliers,becausetheoverallshapeandtendencyofdatapointsinaplotcanrevealalotofinformation.Someworkhasbeendoneinndingstructuresinscatterplotvisu-alizations.PRIM-9[19]isasystemthatmakesuseofscatterplots.InPRIM-9dataisprojectedontoatwo-dimensionalsubspacede-nedbyanypairofdimensions.Thustheusercannavigatealltheprojectionsandsearchforthemostinterestingones.Thedatacanalsoberotated,isolatedandmaskedtohelptheusertondstructuresthatmaynotbevisibleinoneofthesimpleorthogonalprojections.Howeverthismanualprojectionpursuitapproachisnotefcientwhendealingwithhighdimensionaldatasets.Itisalsolikelytoresultinundetectedstructuressinceit'sbasedontheuser'sknowledgeandperceptionofthedata.Automaticprojectionpursuittechniques[4]utilizealgorithmstodetectstructureinprojectionsbasedonthedensityofclustersandseparationofdatapointsintheprojectionspacetoaidinndingthemostinterestingplots. Figure3:ScatterplotmatricesvisualizationofCarsdataset.InFig.3-(a)dimensionsarerandomlypositioned.AfterclutterreductionFig.3-(b)isgenerated.The¯rstfourdimensionsareorderedwiththehigh-cardinalitydimensionreorderingapproach,andtheotherthreedimensionsareorderedwithlow-cardinalityapproach.Withamatrixofscatterplots,usersarenotonlyabletondplotswithstructure,butalsocanviewandcomparetherelationshipbe-tweentheseplots.Withscatterplotmatricesvisualizations,alltwo-dimensionalplotsaredisplayedonthescreen.Thuschangingthedimensionorderdoesnotresultindifferentprojections,butratheradifferentplacementofthepairwiseplots.Inpractice,itwillbebenecialfortheusertohaveprojectionsthatdisclosearelatedstructuretobeplacednextorclosetoeachotherinordertorevealimportantdimensionrelationshipsinthedata.Tomakethispossi-ble,wehavedenedacluttermeasureforscatterplotmatrices.Themainideaistondthestructureinall2-dimensionalprojectionsanduseittodeterminethepositionofdimensionssothatplotsdis-playingasimilarstructurearepositionedneareachother.Figure3givestwoviewsofscatterplotmatricesvisualization.Inthesevisualizations,wecanseparatethedimensionsintotwocategories:high-cardinalitydimensionsandlow-cardinalitydimen-sions.Inhigh-cardinalitydimensions,datavaluesareoftencontin-uous,suchasheightorweight,andcantakeonanyrealnumberwithintherange.Inlow-cardinalitydimensions,datavaluesareof-tendiscrete,suchasgender,type,andyear.Thesedatapointscanonlytakeasmallnumberofpossiblevalues.Itisoftenperceivedthatplotsinvolvingonlyhigh-cardinalitydimensionswillplacedotsinascatteredmannerwhileplotsinvolvinglow-cardinalitydi-mensionswillplacedotsinstraightlinesbecausealotofdatapointssharethesamevalueonthisdimension.However,adimensionbe-ingcontinuousordiscretedoesnotinformuswhetherithashighorlowcardinality.Inthispaper,wedetermineifadimensionishighorlow-cardinalitydependingonthenumberofdatapointsandtheirpossiblevalues.Letmidenotethenumberofpossibledatavaluesonthethdimension,andNdenotethetotalnumberofdatapoints.Ifmi¸N,dimensionisconsideredhigh-cardinality,otherwiseitislow-cardinality.Wewilltreathigh-cardinalitydimensionsandlow-cardinalitydi-mensionsseparatelybecausetheygeneratedifferentplotshapes.Theclutterdenitionandcluttercomputationalgorithmsforthesetwoclassesofdimensionswilldifferfromeachother.4.2High-CardinalityClutterMeasureinScatterplotMatrices4.2.1ClutterDenitionThecorrelationbetweentwovariablesreectsthedegreetowhichthevariablesareassociated.Themostcommonmeasureofcorrela-tionisthePearsonCorrelationCoefcient,whichcanbecalculatedas:r=åi(xi¡xm)(yi¡ym) p åi(xi¡xm)2p åi(yi¡ym)2(2)wherexiandyiarethevaluesofthethdatapointonthetwodimensions,andxmandymrepresentthemeanvalueofthetwodimensions.Sinceplotssimilarlycorrelatedwilllikelydisplayasimilarpatternandtendency,wecancalculatethecorrelationsforallthetwo-dimensionalplots(infacthalfofthemaresymmetricalongthediagonal),andreorderthedimensionssothatsimilarplotsaredisplayedasclosetoeachotheraspossible.Wewilldenetheplotsidelengthtobe1andcalculatethedistancebetweenplotsXandYusingp (RowX¡RowY)2+(ColumnX¡ColumnY)2.Forexample,inFigure4,thedistancebetweensimilarplotsAandBwillbep (1¡0)2+(1¡0)2=p 2.Thelargerthisnumberisforadisplay,themorecluttereditis.Wethendenethetotaldistancesbetweensimilarplotstobethecluttermeasure. Figure4:Illustrationofdistancecalculationinscatterplotmatrices.4.2.2AlgorithmforComputingClutterInhigh-cardinalitydimensionspace,theapproachtocalculateto-talclutterforacertaindimensionorderingisasfollows.Letpibethethplotwevisit.Letthresholdbethemaximumcorre-lationdifferencebetweenplotsthatcanbecalled”similar”.Notethatweareonlyconcernedwiththelower-lefthalfoftheplots,be-causetheplotsaresymmetricalongthediagonal.Theplotsalongthediagonalwillnotbeconsideredbecausetheyonlydisclosethecorrelationsofdimensionswiththemselves.Thisisalways1.Step1.AcorrelationmatrixM(n;n)isgeneratedforallndimensions.M[][]representsthePearsoncorrelationcoef-cientfortheplotonthethrow,thcolumn.Step2.Ã0.Visitplotp0.Findalltheunvisitedplotsthathavesimilarcorrelationwithp0,i.e.,thedifferencesbetweentheirPearsoncorrelationwithp0'sarewithinthreshold.Cal-culatetheirdistancesfromp0onthedisplay,andaddthemtothetotalcluttermeasure.Step3.Ã+1.Visitpi.Findallunvisitedplotssimilar enoughtopi.Calculatetheirdistancesfrompionthedisplay,andaddthemtothetotalcluttermeasure.Step4.Ifallplotshavebeenvisited,stop.Otherwisegotostep3.Thisway,wewillgetatotaldistanceforanyscatterplotma-tricesdisplay.Withthismeasure,wewillbeabletomakecom-parisonsbetweendifferentdisplaysofthesamedata.Unliketheone-dimensionalparallelcoordinatesdisplay,wehavetocalculatedistanceforeverypairofplots.ThisisaO(n2)process.Wewilldotheexhaustivesearchforbestordering,sothetotalcomputingtimewillbeO(n2¤n!).4.3Low-CardinalityClutterMeasureinScatterplotMatricesInlow-cardinalitydimensions,wealsowanttoplacesimilarplotstogether.Buttheyhaveadifferentcluttermeasurefromhigh-cardinalitydimensions.Plotsinvolvinglow-cardinalitydimensionsareverydifferentindisplaypatternfromthoseonlyinvolvinghigh-cardinalitydimen-sions.Theuser'sperceptionwillnaturallyenvisionthemastwodifferenttypesofpatterns.Forplotswithlow-cardinalitydimensions,thehigherthecardi-nality,themorecrowdedtheplotseemstobe.Therefore,wehaveadifferentmeasureofclutterforthesedimensions.Insteadofnav-igatingalldimensionordersandsearchingforthebestone,wewillorderthesedimensionsaccordingtotheircardinalities.Dimensionswithhighercardinalityarepositionedbeforelower-cardinalitydi-mensions.Inthisway,plotswithsimilardensityareplacedneareachother.Thissatisesourpurposeforclutterreduction.Thedotdensityofplotswillappeartodecreasegradually,resultinginlessclutter;ormoreperceivedorder,intheview.Withlow-cardinalitydimensions,thedimensionreorderingcanbeenvisionedasasortingproblem.Withaquicksortalgorithm,wecanthenachieveitwithinO(n¤logn)time.4.4ExampleFromFigure3wenoticethatplotsgeneratedbytwohigh-cardinalitydimensionsareverydifferentinpatternwithplotsin-volvingoneortwolow-cardinalitydimensions.Webelievethatseparatingthehighandlow-cardinalitydimensionsfromeachotherwillbeusefulinhelpingtheuseridentifysimilarlow-cardinalitydi-mensionsandndsimilarplotsmoreeasilyinthehigh-cardinalitydimensionsubspace.5STARGLYPHS5.1ClutterAnalysisinStarGlyphsAglyphisarepresentationofadataelementthatmapsdatavaluestovariousgeometricandcolorattributesofgraphicalprimitivesorsymbols[14].XmdvToolusesstarglyphs[17]asoneofitsfourvisualizationapproaches.Inthistechnique,eachdataelementoc-cupiesoneportionofthedisplaywindow.Datavaluescontrolthelengthofraysemanatingfromacentralpoint.Theraysarethenjoinedbyapolylinedrawnaroundtheoutsideoftheraystoformaclosedpolygon.Instarglyphvisualization,eachglyphrepresentsadifferentdatapoint.Withdimensionsordereddifferently,theglyph'sshapevaries.Sinceglyphsarestand-alonegraphicalentities,weconsiderreducingclutterhereastomakethosesingledatapointsoverallseemmorestructured.GestaltLawsarerobustrulesofpatternper-ception[20].Theystatethatsimilarityandsymmetryaretwofac-torsthathelpviewersseepatternsinthevisualdisplay.Supposewewanttondstructureinoneglyph.Forthisglyph,wemaycallitwellstructuredifitsraysarearrangedsothattheyhavesimilarlengthtotheirneighborsandarewellbalancedalongsomeaxis.Inourapproach,wedenemonotonicityandsymmetryasourmea-suresofstructureforglyphs.Thereforeusercanndmonotonicstructure,symmetricstructureoracombinationofthetwointhedata.Let'stakemonotonicity+symmetryforexample.Theninaper-fectlystructuredglyphwehave:Neighboringrayshavesimilarlengths.Thelengthsofraysareorderedinamonotonicallyincreasingordecreasingmanneronbothsidesofanaxis.Raysofsimilarlengthsarepositionedsymmetricallyalongeitherahorizontalorverticalaxis.Theperfectlystructuredstarglyphisthusateardropshape.Withsuchshapesinglyphs,theuserwillnditeasiertoidentifyrelativevaluedifferencesbetweendimensions,andcanbetterdiscernraysandtheboundingpolylines.Forinstance,thedatapointsshowninFigure5presentverydifferentshapeswithdifferentdimensionorder.TheoriginalorderinFig.5-(a)makesthemlookirregularanddisplayaconcaveshape,whilethedimensionorderinFig.5-(b)makethemmoresymmetricandeasytointerpret. Figure5:ThetwoglyphsinFig.5-(a)representthesamedatapointsasFig.5-(b),withadi®erentdimensionorder.5.2ClutterMeasureinStarGlyphs5.2.1ClutterDenitionToreducetheclutterforthewholedisplay,weseektoreorderthedimensionsforthepurposeofminimizingthetotaloccurrenceofunstructuredraysinglyphs.Therefore,wedeneclutterastheto-talnumberofnon-monotonicandnon-symmetricoccurrences.Webelievethatwithmoreraysindatapointsdisplayingamonotonicandsymmetricshape,thestructureinthevisualizationwillbeeas-iertoperceive.5.2.2AlgorithmforComputingClutterInordertocalculateclutterinonedisplay,wetesteveryglyphforitsmonotonicityandsymmetry.Supposetheuserchoosesmono-tonicallyincreasingandsymmetryasthestructuremeasure.Theusercanthenchooseathreshold1forcheckingmonotinicity,andathreshold2forcheckingsymmetry.1and2aremeasuresfornormalizednumbersandthuscantakeanynumberfrom0to1.Ifforapoint'snormalizedvaluesontwoneighboringdimensions(dimensionn¡1anddimension0arenotconsideredneighbors)piandpi+1,pi+1islessthanpi,wewillseeifpi¡pi+1islessthanthresholdt1ornot.Ifso,weconsiderthisnon-monotonicityoccu-ranceastolerable.Ifnot,wewilladdthisoccurancetoourmeasurecountofunstructuredness.Similarly,fortwodimensionsthataresymmetricallypositionedalongthehorizontalaxis,iftheirdiffer-enceiswithinthresholdt2,theyareconsideredsymmetrictoeachother.Otherwiseanotherincrementisaddedtothetotaloccurrenceofunstructuredness. Figure6:Starglyphvisualizationsofcoaldisasterdataset.Fig.6-(a)representsthedatawithoriginaldimensionorder,andFig.6-(b)showsthedataafterclutterbeingreduced.Thecalculationforasingleglyphinvolvesgoingthroughn¡1pairsofneighboringdimensionstocheckformonotonicityandn=2pairsofdimensionssymmetricalongtheaxis.Therefore,foradatasetwithmdatapoints,thecalculationtakesO(n¤m).Withtheexhaustivesearchforbestordering,thecomputationalcomplexityfordimensionalreorderinginstarglyphsisthenO(n¤m¤n!).5.3ExampleForeachorderingwecancounttheunstructurednessoccurrencestondtheorderthatminimizesthismeasure.Figure6displaystheCoalDisasterdatasetbeforeandafterclutterreduction.InFig.6-(a),manyglyphsaredisplayedinaconcavemanner,andit'shardtotellthedimensionsfromboundingpolylines.ThissituationisimprovedinFig.6-(b)withclutter-baseddimensionreordering.6DIMENSIONALSTACKING6.1ClutterAnalysisinDimensionalStackingThedimensionalstackingtechniqueisarecursiveprojectionmethoddevelopedbyLeBlancetal.[12].Eachdimensionofthedatasetisrstdiscretizedintoauser-speciednumberofbins.Thentwodimensionsaredenedasthehorizontalandverticalaxis,cre-atingagridonthedisplay.Withineachboxofthisgridthisprocessisappliedagainwiththenexttwodimensions.Thisprocesscon-tinuesuntilalldimensionsareassigned.Eachdatapointmapstoasinglebinbasedonitsvaluesineachdimension.Inthistechnique,thedimensionorderdeterminestheorienta-tionofaxesandthenumberofcellswithinagrid.Theinner-mostdimensionsarenamedthefastestdimensionsbecausealongthesedimensionstwosmallbinsimmediatelynexttoeachotherrepresenttwodifferentrangesofthedimensions.Onthecontrary,theouter-mostdimensionshavetheslowestvaluechangingspeed,meaningmanyneighboringbinsonthesedimensionsarewithinthesamevaluerange.Therefore,indimensionalstacking,theorderofdi-mensionshasahugeimpactonthevisualdisplay.Fordimensionalstacking,thebinswithinwhichdatapointsfallareshownaslledsquares.Thesebinsnaturallyformgroupsinthedisplay.Wehypothesizethatauserwillconsideradimen-sionalstackingvisualizationashighlystructuredifitdisplaysthesesquaresmostlyingroups.Comparedtoadisplaywithmainlyran-domlyscatteredlledbins,thosethatcontainasmallnumberofgroupscanrevealmuchmoreinformation.Thedatapointswithinagroupsharesimilarattributesinmanyaspects.Thusthisviewwillhelptheusertosearchforgroupingsinthedatasetaswellastodetectsubtlevarianceswithineachgroupofdatapoints.Theotherdatapointsthatareconsideredasoutliersmayalsobereadilyperceivedifmostdatafallswithinasmallnumberofgroups.6.2ClutterMeasureinDimensionalStacking6.2.1ClutterDenitionWedenethecluttermeasureastheproportionofoccupiedbinsaggregatedwitheachotherversussmallisolated“islands”,namelythelledbinswithoutanyneighborsaroundthem.Ameasureofcluttermightthenbenumberofisolatedfilledbins numberoftotaloccupiedbins.Thedimensionor-derwhichminimizesthisnumberwillthenbeconsideredthebestorder.Besidesthat,weneedtoalsodenewhichbinsareconsid-eredneighbors.Thechoicesare4-connectedbinsand8-connectedbins.With4-connectedneighbors,thepointsconsideredaggregatedwillsharethesamedatarangeonallbutonedimension,whilethe8-connectedbinsmayfallintodifferentdatarangesonatmosttwodimensions.Andsincetheyareconnected,theirvaluesonthosedimensionshavetofallintoimmediatelyneighboringvalueranges.6.2.2AlgorithmofComputingClutterGivenadimensionorder,ourapproachwillsearchforalllledbinsthatareconnectedtoneighborsandcalculateclutteraccordingtotheabovecluttermeasure.Thedimensionorderthatminimizesthisnumberisconsideredthebestordering.Thealgorithmissimilartothatusedwithhigh-cardinalitydi-mensionsinscatterplotmatrices.Howeverwearecomparingthepositionofbinsinsteadofplots.ThecomputationalcomplexitywillbeO(m2)foronedimensionorder,andtheoptimalsearchwilltakeO(m2¤n!).6.3ExampleAnexampleofclutterreductionindimensionalstackingisgiveninFigure7.Wehavedened8-connectedasourmeasureforneighbor. Figure7:DimensionalstackingvisualizationforIrisdataset.Fig.7-(a)representsthedatawithoriginaldataset,andFig.7-(b)showsthedatawithclutterreduced.Table1:Tableofcomputationtimesusingoptimalorderingalgorithm Visualization AlgorithmComplexity Dataset DataNumber Dimensionality Time ParallelCoordinates O(n¤n!) AAUP-Part 1161 9 3secs Cereal-Part 77 10 23secs Voy-Part 744 11 4:02mins ScatterplotMatrices O(n2¤n!) Voy-Part 744 11(6high-carddimensions) 5secs AAUP-Part 1161 9 3:13mins StarGlyphs O(m¤n¤n!) Cars 392 7 18secs DimensionalStacking O(m2¤n!) CoalDisaster 191 5 10secs Detroit 13 7 2:10mins Fig.7-(a),denotingtheoriginaldataorder,iscomposedofmany”islands”,namelythelledbinswithoutanyneighborstothem.InFig.7-(b),theoptimalordering,therearemuchfewer”islands”,resultinginaneasierinterpretation.7ANALYSISOFREORDERINGALGORITHMSAsstatedpreviously,thecluttermeasuringalgorithmsforthefourvisualizationtechniquestakedifferentamountoftimetocomplete.Letmdenotethedatasize,andndenotethedimensionality.Thecomputationalcomplexityofmeasuringclutterinthefourtech-niquesispresentedinTable1.Ideally,wewouldhopetouseanexhaustivesearchtondabestdimensionorderthatminimizesthetotalclutterinthedisplay.However,in[2],Ankerstetal.provedthatanoptimalsearchforbestdimensionorderisanNP-completeproblem,equivalenttotheTravelingSalesmanProblem.Therefore,wecandotheoptimalsearchwithonlylowdimensionalitydatasets.Togetaquantitativeunderstandingofthisissue,wedidafewexperimentsfordifferentvisualizations,andtheresultsweobtainedarepresentedintable1.Werealizedthateveninalowdimensionaldataspace-around10dimensions-thecomputationaloverheadcanbesignicant.Ifthedimensionnumberexceedsthat,weneedtoresorttoheuristicapproaches.Forexample,randomswapping,nearest-neighborandgreedyalgorithmshavebeenimplementedbyus.Therandomswappingalgorithmstartswithaninitialcongu-rationandrandomlychoosestwodimensionstoswitchtheirposi-tions.Ifthenewarrangementresultsinlessclutter,thenthisar-rangementiskeptandtheoldoneisrejected;otherwisewewillleavetheoldarrangementintactandgoonswappinganotherpairofdimensions.Keepdoingthisanumberoftimesuntilnobetterresultisgeneratedforacertainnumberofswaps.Thisalgorithmcanbeappliedtoallthevisualizationtechniques.Thenearest-neighboralgorithmstartswithaninitialdimension,ndsthenearestneighborofit,andaddsthenewdimensionintothetour.Then,itsetsthenewdimensiontobethecurrentdimensionforsearchingneighbors.Continueuntilallthedimensionsareaddedintothetour.Thegreedyalgorithm[3]keepsaddingthenearestpossiblepairsofdimensionsintothetour,untilallthedimensionsareinthetour.Thenearest-neighborandgreedyalgorithmsaregoodforparal-lelcoordinatesandscatterplotmatricesdisplays,becauseinthosedisplays,thereissomeoverallrelationshipbetweendimensionsthatcanbecalculated,suchasthenumberofoutliersbetweendimen-sionsandcorrelationbetweendimensions.However,inthestarglyphanddimensionalstackingvisualizations,wedon'thaveadi-rectmeasureofdimensionrelationship.Thusthesealgorithmsarenotveryamenabletothelattertwotechniques.Withheuristicalgorithms,wecanworkondimensionreorderingwithmuchhigherdimensionswithrelativelygoodresults.Experi-mentalresultsarepresentedinTable2.8CONCLUSIONANDFUTUREWORKInthispaper,wehaveproposedtheconceptofvisualclutterreduc-tionusingdimensionreorderinginmulti-dimensionalvisualization.Westudiedfourratherdistinctvisualizationtechniquesforclutterreduction.Foreachofthem,weanalyzeditscharacteristicsandthendenedanappropriatemeasureofvisualclutter.Inordertoobtaintheleastclutter,wethenusedreorderingalgorithmstosearch Table2:Tableofcomputationtimesusingheuristicalgorithms Visualization Dataset DataNumber Dimensionality Algorithm Time ParallelCoordinates Census-Income 200 42 Nearest-NeighborAlgorithm 2secs GreedyAlgorithm 3secs RandomSwapping 2secs AAUP 1161 14 Nearest-NeighborAlgorithm 7secs GreedyAlgorithm 9secs RandomSwapping 6secs ScatterplotMatrices Census-Income 200 42 Nearest-NeighborAlgorithm 2secs GreedyAlgorithm 3secs RandomSwapping 2secs AAUP 1161 14 Nearest-NeighborAlgorithm 8secs GreedyAlgorithm 8secs RandomSwapping 7secs StarGlyphs Census-Income 200 42 RandomSwapping 2secs AAUP 1161 14 RandomSwapping 7secs DimensionalStacking Thosedatasetsaretoobigfordimensionalstackingvisualization. foradimensionorderthatminimizestheclutterinthedisplay.Thisrepresentsarststepintotheeldofautomatedclutterre-ductioninmulti-dimensionalvisualization.Therearemanyvisual-izationtechniquesthatwehaven'texperimentedwithyet;andcer-tainlyourcluttermeasuresarenottheonlyonespossible.Ourhopeistogiveuserstheabilitytogenerateviewsoftheirdatathatwillenablethemtodiscoverstructurethattheywillotherwisenotndinaviewwiththeoriginalorarandomdimensionorder.Futureworkwillincludethecombinationofclutterreductionapproacheswithdimensionreductionorhierarchicaldatavisual-ization,togaugetheeffectivenessofthesetechniquesinhigh-dimensionalorhighdatavolumedatasets.Inthispaper,weonlydiscussedtheusageofdimensionorderforreducingclutter.How-ever,therearecertainlyothervisualaspectsthataffectclutterorstructureinadisplayandthuscanhelpfacilitatetheinterpretationofavisualization.REFERENCES[1]D.F.Andrews.Plotsofhighdimensionaldata.Biometrics,28:125–136,1972.[2]M.Ankerst,S.Berchtold,andD.A.Keim.Similarityclusteringofdimensionsforanenhancedvisualizationofmultidimensionaldata.Proc.IEEESymposiumonInformationVisualization,pages52–60,1998.[3]ThomasH.Cormen,E.Leiserson,Charles,andRonaldL.Rivest.In-troductiontoAlgorithms.MITPress,1990.CORt01:11.Ex.[4]S.L.CrawfordandT.C.Fall.Projectionpursuittechniquesforvisual-izinghigh-dimensionaldatasets.VisualizationinScienticComput-ing,(G.M.NielsonandB.Shriver,eds.),pages94–108,1990.[5]Y.Fua,M.O.Ward,andE.A.Rundensteiner.Hierarchicalparallelco-ordinatesforexplorationoflargedatasets.Proc.IEEEVisualization,pages43–50,Oct.1999.[6]A.InselbergandB.Dimsdale.Parallelcoordinates:Atoolforvisu-alizingmultidimensionalgeometry.Proc.IEEEVisualization,pages361–378,1990.[7]AnilK.JainandRichardC.Dubes.Algorithmsforclusteringdata.Prentice-Hall,Inc.,1988.[8]J.Jolliffe.PrincipalComponentAnalysis.SpringerVerlag,1986.[9]D.A.Keim.Pixel-orientedvisualizationtechniquesforexploringverylargedatabases.JournalofComputationalandGraphicalStatistics,5(1):58–77,1996.[10]T.Kohonen.Theself-organizingmap.Proc.IEEE,pages1464–1480,1978.[11]J.B.KruskalandM.Wish.MultidimensionalScaling.SagePublica-tions,1978.[12]J.LeBlanc,M.O.Ward,andN.Wittels.Exploringn-dimensionaldatabases.Proc.IEEEVisualization,pages230–237,1990.[13]Y.K.LeungandM.D.Apperley.Areviewandtaxonomyofdistortion-orientedpresentationtechniques.ACMTransactionsonComputer-HumanInteraction,1(2):126–160,1994.[14]R.J.Littleeld.Usingtheglyphconcepttocreateuser-denabledis-playformats.Proc.NCGA,pages697–706,1983.[15]S.Y.LuandK.S.Fu.Asentence-to-sentenceclusteringprocedureforpatternanalysis.IEEETransactionsonSystems,ManandCyber-netics,8:381–389,1978.[16]M.Sheelagh,T.Carpendale,D.J.Cowperthwaite,andF.D.Fracchia.Distortionviewingtechniquesfor3-dimensionaldata.Proc.IEEESymposiumonInformationVisualization,pages46–53,1996.[17]J.H.Siegel,E.J.Farrell,R.M.Goldwyn,andH.P.Friedman.Thesurgi-calimplicationofphysiologicpatternsinmyocardialinfarctionshock.Surgery,72:126–141,1972.[18]C.StolteandP.Hanrahan.Polaris:Asystemforquery,analysis,andvisualizationofmultidimensionalrelationaldatabases.Proc.IEEESymposiumonInformationVisualization,pages5–14,2000.[19]J.W.Tukey,M.A.Fisherkeller,andJ.H.Friedman.Prim-9:Aninter-activemultidimensionaldatadisplayandanalysissystem.DynamicGraphicsforStatistics,(W.S.ClevelandandM.E.McGill,eds.),pages111–120,1988.[20]C.Ware.InformationVisualization:PerceptionforDesign.HarcourtPublishersLtd,2000.[21]S.L.Weinberg.Anintroductiontomultidimensionalscaling.Mea-surementandevaluationincounselinganddevelopment,24:12–36,1991.[22]G.J.Wills.Aninteractiveviewforhierarchicalclustering.Proc.IEEESymposiumonInformationVisualization,pages26–31,1998.[23]P.C.WongandR.D.Bergeron.Multiresolutionmultidimensionalwaveletbrushing.Proc.IEEEVisualization,pages141–148,1996.[24]Xmdvtoolhomepage.http://davis.wpi.edu/xmdv/.http://davis.wpi.edu/˜xmdv.[25]J.Yang,W.Peng,M.O.Ward,andE.A.Rundensteiner.Interactivehierarchicaldimensionordering,spacingandlteringforexplorationofhighdimensionaldatasets.Proc.IEEESymposiumonInformationVisualization,pages105–112,2003.[26]J.Yang,M.O.Ward,E.A.Rundensteiner,andS.Huang.Visualhi-erarchicaldimensionreductionforexplorationofhighdimensionaldatasets.Eurographics/IEEETCVGSymposiumonVisualization,pages19–28,2003.