Aditya Gaurav Bhalotia Soumen Chakrabarti Arvind Hulgeri Charuta Nakhe Parag S Sudarshan Computer Science and Engg Dept IIT Bombay badityasoumenaruparagsudarsha cseiitbacin bhalotiaeecsberkeley edu charutapsplcoin Abstract The BANKS system ID: 4289
Download Pdf The PPT/PDF document "BANKS Browsing and Keyword Searching in ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
BANKS:BrowsingandKeywordSearchinginRelationalDatabasesB.AdityaGauravBhalotiaSoumenChakrabartiArvindHulgeriCharutaNakheParagS.SudarshanComputerScienceandEngg.Dept.,I.I.T.Bombaybaditya,soumen,aru,parag,sudarsha@cse.iitb.ac.inbhalotia@eecs.berkeley.educharuta@pspl.co.inAbstractTheBANKSsystemenableskeyword-basedsearchondatabases,togetherwithdataandschemabrowsing.BANKSenablesusersto PartlysupportedbyanIBMFacultyFellowshipgrantand Mining Surprising Patterns UsingTemporal Description Length ChakrabartiSD98...Paper Tuple PaperIdPaperName ByronDChakrabartiSD9 8 Writes Tuple SunitaSChakrabartiSD98Writes Tuple ...Sunita SarawagiSunitaSAuthor Tuple A uthorIdAuthorName . .. Byron DomByronDAuthor Tuple Figure1:AFragmentoftheDBLPDatabaseBANKS,includingrelevancescorecomputation,isde-scribedinmoredetailinSection2.BANKSprovidesarichinterfacetobrowsedata,withautomaticgenerationofhyperlinks.Thebrows-ingcomponentofBANKSisdescribedinmorede-tailinSection3.TheBANKSsystemisdevelopedinJavausingservletsandJDBC,andcanberunonanydatabase,withoutanyprogramming.Wearealsode-velopingaversionofBANKSthathandlesXMLdata.ThegreatestvalueofBANKSliesinalmostzero-eortWebpublishingofrelationaldatawhichwouldotherwiseremaininvisibletotheWeb[2].Forexam-ple,BANKSmaybeusedtopublishorganizationaldata,bibliographicdata,andelectroniccatalogs.AdemooftheBANKSsystemisaccessibleovertheWebattheURL:http://www.cse.iitb.ac.in/banks/2KeywordsearchConsiderafragmentofabibliographicdatabaseshowninFigure1.Thisdatabasecontainspapertitles,theirauthorsandcitationsextractedfromtheDBLPreposi-tory.Aswecansee,duetonormalization,informationaboutasinglepaperisdistributedacrossseventuplesrelatedthroughforeignkeyreferences.Auserlookingforthispapermayusequerieslikesunitatemporalorsoumensunita.TheBANKSsystemmodelsthedatabaseasadi-rectedgraph,witheachtupleinthedatabasecorre-spondingtoanodeinthegraph.Eachforeign-key primary-keylinkismodeledasadirectededgebetweenthecorrespondingtuples.(Thiscanbeeasilyextendedtoothertypeofconnections.)AkeywordqueryinBANKSconsistsof1searchterms,...,t.Therststepistolocatethesetofnodesmatchingsearchterms;anodematchesasearchtermifitcontainsthesearchtermaspartofanat-tributevalueormetadata(suchascolumn,tableorviewnames).Letdenotethesetofnodesmatch- Figure2:Resultofquerysoumensunita;anodemaymatchmorethanonesearchterm,sothesmayoverlap.Intuitively,ananswertoaqueryisasubgraphconnectingsomesetofnodesthatcoverthekeywords,i.e.,eachkeywordmustmatchoneofthenodesin.Justbylookingatasubgraphitmaynotbeapparentwhatinformationitconveys.Wewishtoalsoidentifyacentralnodeinthesubgraph,thatconnectsallthekeywordnodes,andstronglyreectstherelationshipamongstthem.Ananswertoaqueryisthereforemodeledasarooteddirectedtreecontainingatleastonenodefromeach;edgesaredirectedawayfromtheroot.Themotivationfordirectionalityisoutlinedlaterinthissection.NotethatthetreemayalsocontainnodesnotinanyandisthereforeaSteinertree.Fig-ure2showsasampleresultofaquerycontainingthekeywordssoumenandsunitaexecutedonthebiblio-graphicdatabase.Indentationisusedtodepictthetreestructure,andnodescontainingkeywordsaredis-tinguishedbytheircolor.2.1AnswerRelevanceIngeneral,theimportanceofalinkdependsuponthetypeofthelink,i.e.,whatrelationsitconnectsandonitssemantics;forexample,inabibliographicdatabase,thelinkbetweenthePapertableandtheWritestableisseenasastrongerlinkthanthelinkbetweenthePapertableandtheCitestable.ThelinkbetweenPaperandCitestablescancorrespondinglybegivenahigherweight.Theweightofatreeisproportionaltothetotalofitsedgeweights,andtherelevanceofatreeisinverselyrelatedtoitsweight.TheexampleinFigure1illustratesthatsomelinkspointtowardtherootofthetree,insteadofawayfromtherootasrequiredbyourmodel.Forinstance,theWritesrelationhasforeignkeystothePaperandAu-thorrelations,whereaswerequirepathsfromPaper Author,traversingaforeignkeyedgeintheop-positedirection.However,wecannotsimplyregardtheedgesasundirected.Ignoringdirectionalitywouldcauseproblemsbecauseofhubsthatareconnectedtoalargenumbersofnodes.Forexample,inauni-versitydatabaseadepartmentwithalargenumberoffacultyandstudentswouldactasahub.Asaresult,manynodeswouldbewithinashortdistanceofmanyothernodes,reducingtheeectivenessoftree-weightbasedscoringmechanism.Tosolvetheproblem,wecreateforeachedge(u,vbackwardedgev,u);intheexamplefromFigure1,thebackwardedgesensurethatthereisadirectedtreerootedatthepaper,withapathtoeachleaf.Wesettheweightof(v,u)totheweightof(u,v)multipliedbyafunctionofthenumberoflinkstofromthenodesofthesametypeas.Experimentswithdierentfunctionsindicatedthatthefunctionlog(1+),whereisthenumberofinlinks,providedgoodresults[3].(Iftherewasalreadyanedgefrom,wesettheedgeweighttotheloweroftheoriginaledgeweightandtheweightcomputedabove.)BANKSincorporatesanotherinterestingfeature,namelynodeweights,inspiredbyprestigerankingssuchasPageRankinGoogle[4].Withthisfeature,nodesthathavemultiplepointerstothemgetahighernodeweight(highernodeweightcorrespondstohigherprestige).E.g.,inabibliographydatabasecontainingcitationinformation,iftheusergivesaqueryQueryOptimizationourtechniquewouldgivehigherprestigetothepaperswithmorecitations.Asanotherexam-ple,inaTPCDdatabasestoringinformationaboutparts,suppliers,customersandorders,theordersin-formationcontainsreferencestoparts,suppliersandcustomers.Asaresult,ifaquerymatchestwoparts(orsuppliers,orcustomers)theonewithmoreorderswouldgetahigherprestige.Inthecurrentimplementationwesetthenodeweighttoafunctionofthein-degreeofthenode.Weexperimentedwithdierentfunctions,andgotgoodresultswiththefunctionlog(1+),whereisthein-degree.Ouruseoflogarithmsinedgeandnodeweightsissimilartotermweightingschemesininfor-mationretrieval.Nodeweightsandtreeweightsarecombinedtogetanoverallrelevancescore.Weexperimentedwithad-ditiveandmultiplicativecombinations,andfoundthatbothworkedwellwhentherelativeweightsforthetwoscoreswereappropriatelychosen.Detailsofthesearchalgorithmandtherelevancecomputation,alongwithapreliminaryperformancestudycanbefoundin[3].Althoughafewothersystemsimplementkeywordsearchondatabases(e.g.,[5,1,6])BANKSdiersfromallpriorworkinseveralways:notably,inthetech-niquesforedgeweightcomputationandprestigebasedranking,andtheuseofanin-memorygraphstructureforveryecientsearchwhilekeepingthebulkofthedatadiskresident.TheconnectionsofBANKStore-latedworkaredescribedinmoredetailin[3].2.2ExtensionsTheBANKSsystemsupportsiterativerenementofIfmultiplenodesmatchakeyword,theusercanselectoneormorenodesasbeingrelevantandignoreothers;asanexample,twoauthorsintheDBLPdatabasematchthekeywordsudarshan,andtheusercanchooseoneofthemandexecuteaquerymatchingitwithotherkeywords.Userscanrequestmoreanswerssimilartooneofthedisplayedanswers;similaritycanbedenedonthebasisoftheanswertreestructure.Otherrenementstotunenodeandedgeweightsarealsounderdevelopment.Insteadofdisplayingtreesconsistingofexplicittu-ples,systemdeveloperscanspecifyanswerformatsbasedonthetypeoftherootoftheanswertrees.Forinstance,onecanspecifyauthor,conference/journalandyearbedisplayedwhenevertherootnodeisfrompaperrelation.Wearecurrentlyworkingonimple-mentinganswerformatting,andonsupportingnega-tionanddisjunctioninqueries.3BrowsingTheBANKSsystemprovidesarichinterfacetobrowsedatastoredinarelationaldatabase.Thesystemau-tomaticallygeneratesbrowsableviewsofdatabasere-lationsandqueryresults;nocontentprogrammingoruserinterventionisrequired.Everydisplayedforeignkeyattributevaluebecomesahyperlinktothereferencedtuple.Inaddition,pri-marykeycolumnscanbebrowsedbackwards,tondreferencingtuples,organizedbyreferencingrelations(userscanselectaspecicreferencingrelation).Eachtabledisplayedcomeswithavarietyoftoolsforinteractingwithdata.Columnscanbeprojectedaway(dropped),andselectionscanbeimposedonanycolumn.Forforeignkeycolumns,clickingonjoinresultsinthereferencedtablebeingjoinedin,anditscolumnsalsodisplayed.Thiseliminatestheneedforexplicitlywritingjoinqueriesforthenormalcaseofforeignkeyjoin.Thejoinfeaturecanalsobeusedintheotherdirection,fromaprimarykeytoareferencingforeignkey.Resultscanbegroupedbyacolumn;thisresultsinonlythedistinctvaluesforthatcolumnbeingdisplayed.Theusercanclickonanyofthevaluestoseethetuplesassociatedwiththatvalue.Tuplesinthedisplayedtablecanbesortedbyaspeciedcolumn. Figure3:BrowsingExamples:(a)Samplebrowsingsession(b)PiechartControlsfortheseoperationscanbeaccessedbyclickingonthecolumnnamesinthetableheader.Inaddition,displayeddataispaginated,andschemabrowsingissupported.Figure3(a)showstheresultofbrowsingthethe-sisdatabasestartingwiththestudentrelation,usingapop-upmenuontherollnumberattributetoeectajoinwiththethesisrelationanddroppingseveralcolumns.Thejoinismadepossiblesincethethesislationhasaforeign-keyattributereferencingthedentrelation.Asamplepop-upmenuisshownforthefemailattributewhichreferencesthefacultytable.Hyperlinksinthedisplayeddataareautomaticallygeneratedbythesystem.EachhyperlinkcorrespondstoanSQLquerythatisexecutedwhenauserclicksonthelinks.Thus,allthepagesinthesystemaregen-eratedontheybyexecutingcorrespondingqueriesagainsttheunderlyingrelationaldatabase.Nopre-computationisrequired.BANKStemplatesprovideseveralpredenedwaysofdisplayingdata.Templateinstancesarecustomized,storedinthedatabase,andgivenahyperlinkname,whichisusedtoaccessthetemplate.TheBANKSsystemcurrentlyprovidesfourtypesoftemplates:Cross-tabs(similartoOLAPcross-tabs),withdrill-downfacilities.Thegroup-bytemplateprovidesahierarchicalviewofdata,byspecifyingasequenceofgroupingattributes.Folder-treeviews,whichprovideanotherhierar-chicalviewofdata.Thegraphicalinterfacetemplatepermitsinforma-tiontobedisplayedinbarchartlinechartpiechartformat.HyperlinksareprovidedonthegraphicaldataviaHTMLimagemaps,toallowdrilldownonthedata.Figure3(b)showsanex-amplepiechartgeneratedbyBANKS.Anotherinterestingfeatureoftemplatesisthattheycanbecomposedtogetherinahyperlinked,visualmanner.SeveralexampletemplatesareavailableontheBANKSwebsite.4ConclusionsTosummarize,wehavedevelopedanintegratedsys-temforkeywordsearchingandbrowsingofdatabases.Thesystemhasmanyusefulfeatureswhichallowca-sualuserstoaccessdatabaseinformationinanintu-itivemanner.BANKSenablesalmosteortlessWebpublishingofrelationalandXMLdatathatwouldoth-erwiseremain(atleastpartially)invisibletotheWeb.WehavealsodevelopedaprototypeversionofBANKSthatworksonXMLdata,supportingkey-wordsearchingandscalablebrowsingoflargeXMLdatasets.WeareworkingonintegratingXMLsearch/browsingwiththerestoftheBANKSsystem.References[1]SanjayAgrawal,SurajitChaudhuri,andGautamDas.DBXplorer:Asystemforkeyword-basedsearchoverrelationaldatabases.InProcs.ICDEFeb.2002.[2]PeterBailey,NickCraswell,andDavidHawking.DarkmatterontheWeb.InPosterProceedings,9thWorld-WideWebConference,2000.[3]GauravBhalotia,ArvindHulgeri,CharutaNakhe,SoumenChakrabarti,andS.Sudarshan.Key-wordsearchingandbrowsingindatabasesusingBANKS.InProcs.ICDE,Feb.2002.[4]SergeyBrinandLawrencePage.Theanatomyofalarge-scalehypertextualWebsearchengine.Com-puterNetworksandISDNSystems,30(1 7),1998.[5]ShaulDar,GadiEntin,ShaiGeva,andEranPal-mon.DTLsDataSpot:Databaseexplorationus-ingplainlanguage.InProcs.VLDB,1998.[6]VagelisHristidisandYannisPapakonstanti-nou.DISCOVER:Keywordsearchinrelationaldatabases.InProcs.VLDB,Aug.2002.