Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomybased col labor

Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomybased col labor - Description

The system accepts data objects described with user cr eated metadata called data units The system supports 64258exible structur on the data units and places no estrictions on the ocab ulary used de vise generic table model or storing and epr esenti ID: 27182 Download Pdf

147K - views

Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomybased col labor

The system accepts data objects described with user cr eated metadata called data units The system supports 64258exible structur on the data units and places no estrictions on the ocab ulary used de vise generic table model or storing and epr esenti

Similar presentations


Download Pdf

Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomybased col labor




Download Pdf - The PPT/PDF document "Collaborati Data Publishing and Searchin..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.



Presentation on theme: "Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomybased col labor"— Presentation transcript:


Page 1
Collaborati Data Publishing and Searching System Beng Chin OOI Bei YU and Guoliang LI National Uni ersity of Sing apore Tsinghua Uni ersity Abstract In this paper we pr esent olksonomy-based col- laborati data publishing and sear ching system. The system accepts data objects described with user -cr eated metadata, called data units The system supports flexible structur on the data units, and places no estrictions on the ocab ulary used. de vise generic table model or storing and epr esenting the data units of arious structur es. pr opose framew ork or managing the data

units and pr viding br wsing, sear ching and querying ser vices er them. pr esent our curr ent appr oaches and discuss ele ant esear ch issues. Index erms olksonomy tagging, clustering Digital information publishing and searching becomes in- creasingly necessary in recent years, due to the popularity of the Internet services. ha witnessed the gro wth of number of such web services including Google Base [3], Delicious [1 ], li eplasma [4], and flickr [2]. While these systems ary in their particular services pro vided, the share the same operation mode users publish data items such as

URLs, pictures, adv ertisements, etc. that are associated with simple descriptions such as tags and attrib utes created by them. The system or anizes the published data items based on users wn descriptions and mak es them accessible. or xample, Delicious allo ws an user to bookmark URLs in its serv er and requires each URL to ha set of user -pro vided tags (a.k.a. yw ords). In some sense, the users collaborati ely contrib ute tags to the system, which are then used to cate gorize the URLs to acilitate webpage searching (by bro wsing or querying the tags). Such collaborati ut unsophisticated ay

of or anizing information with user -created metadata is coined as folksonomy (combination of “folk and “taxonomy”), and such systems are sometimes called the collaborati tagging systems [9 ]. distinguishing feature of these systems is that the user plays the dominant role the collaborati beha vior of the users decides the data semantics and or anization of the systems. Although the lack of controlled ocab ulary and systematic taxonomy of the concepts mak es the classification of the data objects in these systems imprecise and imperfect, the are con enient to users and easy to deplo and

more importantly the can adapt to the dynamic changes of the eb content. Besides simple tags, richer and fle xible data structure could be used to describe published item to pro vide more po werful xpressi eness to the user or xample, Google Base allo ws users to define their wn attrib utes, and to describe their published objects with ariable number of attrib utes. The freedom from strict syntax of the published data items is ery con enient to users. Ho we er it is challenging task to or anize and classify the data items with ariable “schemas and topics, in order to mak them

searchable. In Google Base, list of types such as pr oducts ecipes are pro vided, and users are encouraged ut not restricted to publish their data item to specific type. The ay Google Base stores and or anizes the data items is unkno wn, and it tak es 15 to 60 minutes to mak ne wly published data items searchable in its website. In this paper we describe our attempt to uild general system frame ork for supporting collaborati publishing and searching services. The system accepts data objects described with user -created metadata, stores and classifies them, and pro vides arious

querying and searching interf aces such as bro wsing, yw ord search, and structured query The metadata created by users can be both for their wn use (for labeling their published information to the system) and for the system to use to or anize all the published information. The data unit, that users use to describe their published information of an kind, consists of title number of fields and set of ta gs Basically field is an attrib ute/v alue pair for characterizing certain property of an object, e.g., color:yellow for pupp ta is ord or phrase the user uses as “k yw ord to

characterize the published object. or xample, the tags of pupp could be animal do etc.. [9] discusses types of tags an user uses to label URL bookmarks on Delicious website. Figure sho ws tw xample data units that describe dif ferent types of information. illustrate, the left data item describes the blog of uzzer It uses four fields for sho wing the location, author type, and language of the blog. In addition, it has tags that are “k yw ords of the blog. Fig. 1. Examples of data units. ithin our system frame ork, we propose data model for storing and representing the collection of data

units accepted from users. It includes single generic table for storing the data units, and set of virtual relations as vie ws of the generic table for representing dif ferent topics of the data stored in the
Page 2
generic table. The user bro wses and queries er the virtual vie ws, and the system retrie es results from the generic table. Our generic table model dif fers from the uni ersal relation model [10], [8 proposed and studied earlier in tw main points. First, the uni ersal relation is designed for logical representation of an application domain in order to free the user from

dealing with specific access paths when issuing queries, while our generic table is the schema for ph ysically representation and store of the tuples, which is not visible to the user Second, compared with that the uni ersal relation schema describes specific application domain, our generic table model is for comprehensi ely storing data of all types of domains. Our proposed system frame ork also includes data units cate gorizer that dynamically clusters and assigns incoming data units into arious virtual relations according to their dif- ferent topics and structures,

multi-function query processor for dealing with arious kinds of queries, storage manager for storing and inde xing the data units whose olume may gro quickly depending on the popularity of the system. Figure illustrates the architecture of our system. Fig. 2. System architecture. (VR is the short for irtual Relation.) The data model in our system is generic relational table and set of virtual relations that are vie ws er the generic table. Data units published to the system are all ph ysically stored in the generic table. Dif ferent from traditional relational database design, the schema of

the generic table in our system is designed in an ad-hoc and dynamic ay It contains set of fix ed attrib utes and set of non-fix ed attrib utes. Fix ed attrib utes are defined by the system, while non-fix ed attrib utes are dynamically inserted according to the data units published to the system. That is, its schema is collaborati ely decided by users. The single table model is suitable for storing the data units in our system because there is no predictable data dependencies among attrib utes collaborati ely defined by mass users. Definition The eneric

elational table sc hema is an x- pr ession of the form wher is the name of the table and is the set of attrib utes suc that and is the finite set of fixed attrib utes, +1 +2 is the infinite set of non-fixed attrib utes. The domains of the attrib utes in are initially undefined, and assumed to be infinite and stored as string format. When the generic table is populated with enough number of tuples, we can use machine learning approach to learn the patterns and statistics of the alues in order to determine the domains of attrib utes. In our system, the

fix ed attrib utes set contains: id author titl and tag where id is system created when ne data unit is inserted and it is the primary of the table, author and titl correspond to the person who publishes the data unit and the title of it, and tag is te xtual attrib ute for storing all the tags listed in the data unit. When the system initializes, non-fix ed attrib utes set is empty More and more attrib utes will be automatically added to it when data units ha ving ne attrib utes are published to the system continuously On the other hand, when obsolete data units are deleted from

the system, some attrib utes may also be remo ed from The generic table is the schema for storing data units in our system, ut it cannot be semantically meaningful to users, because the data units stored in it are ery di erse in their topics. In our system, the user poses queries in terms of set of virtual relations, which are defined as vie ws of the generic table. virtual relation schema is mapped to and subset of the attrib utes in of the generic table. The set of virtual relations defined in the system is called the virtual sc hema (or semantic sc hema of the generic table.

Definition virtual elation of the eneric table is defined as vie with query ver instance of denoted as 7! In the generic table, nul alues are allo wed. The semantics of the nul is treated as inapplicable and the operations on nul alues are the same as in traditional relational model. In particular if tuple has nul alues in all the columns, it is called null tuple. If the instance of relation contains all null tuples, is re arded as nul i.e., nul Definition Given two (virtual) elations and and supposing two tuples and ar equivalent if nul and nul denoted as When the system

adds an attrib ute to of the generic table, all tuples of the current instance are assigned nul alues for the ne attrib ute, and the maintain equi alence with their original forms. Definition Given an instance of an attrib ute is unnecessary if nul Unnecessary attrib utes may appear in generic table or virtual relations when obsolete tuples are deleted, and the need be automatically remo ed. The basic operations on the generic table and virtual rela- tions include selection ), projection ), join ), union ),
Page 3
and dif ference ), which are the same as in traditional relation

model. Based on the basic operations, we define an augmented selection operator called educible selection denoted as Gi en relation and its instance and supposing and dom tak es as input relation and returns the relation: )) where and is the set of all unnecessary attrib utes of That is, the resultant relation of reducible selection will ha no unnecessary attrib utes. Theor em Supposing is selection condition on subset of columns of elation whose instance is the two selection oper ations and pr oduce two elations consisting of equivalent tuples. In addition to ordinary comparison

operators such as there are also te xtual-based comparisons in our data model, since most data units ha lots of te xtual fields, e.g., ta gs attrib ute. Gi en set of yw ords relation and its instance and subset of te xtual attrib utes for each tuple the te xtual comparison operator match returns score scor [0 1] indicating its rele ance to the yw ords. The match op- erator relies on fullte xt inde on the yw ords of the alues of te xtual attrib utes. Gi en threshold match ;K > will return all tuples in whose scores are matched higher than The query er the generic table and virtual

relations is constructed by combining the basic operators. Definition virtual sc hema is complete in terms of an instance of the eneric table if )) )) )) wher denotes outer join, and ))(1 is the instance of ener ated by valuating query whic is the vie defined for ver ith complete virtual schema, the content stored in the generic table can be fully xposed to users. As collaborati publishing and searching system, our system initializes with an empty generic table AllUnits with system created attrib utes, defined as Al nits id; author titl e; tag As users publish data units to

the system, the generic table is populated with more and more tuples, and consequently it will ha more and more attrib utes. When storing data unit into the generic table, we represent it as tuple according to the schema of the generic table. If there are attrib utes defined in the data unit that are not in Al nits the system will add those attrib utes to during the insertion of the ne tuple. or xample, if upon the system initialization, the tw data units sho wn in Figure are published to the system one at time, the resultant generic table schema is AllUnits(id, title author ta gs,

homepage, blog type, language after the first data unit is inserted, and it becomes AllUnits(id, title author ta gs, homepa blo type langua news source, publish date after the second one is inserted. data unit becomes tuple in the generic table after it is stored in the system. In the rest of the paper we will use data unit and tuple interchangeably for referring to the tuples stored in the tables. The generic table is maintained by the storage manager component (Figure 2), and it is not visible to the user Users access the stored data through the virtual relations er the generic table,

which are uilt and updated incre- mentally as ne data units are published to the system. Each virtual relation should represent semantic cate gory meaningful to the user or xample, virtual relation bl og bl og name; bl og ty pe; homepag represents cate gory of data units describing blogs. Since the domains of published data units is unrestricted, constructing the virtual relations is identified as the task of incrementally clustering the incoming data units into arious virtual relations. This task is performed by the data unit cate gorizer depicted in Figure 2. Therefore, our system need

perform se eral actions when accepting ne data unit. First, the data unit classifier either assigns the ne data unit to an xisting virtual relation, or creates ne virtual relation with the ne data unit as the only tuple. Then, the data unit is passed to the storage manager which actually inserts the ne data unit into Al nits and updates the mapping between the virtual relation accepting the ne tuple and Al nits if the ne tuple causes changes of the schema of Al nits intend to pro vide in the multi-function query processor (Figure 2) broad range of services er the published data to the

user including: irtual Sc hema br owsing: The user can bro wse the virtual relation schemas in manner similar to bro wsing each of semantic cate gories of lar ge content classification system, in order to zoom the query to one or more cate gories he or she is interested in. ywor sear h: Gi en yw ord query we return list of matching data units rank ed according to their estimated rele ance to the query Structur ed querying: will also pro vide structured query interf ace for user to issue structured query er one or more virtual relations. The structured query will be transformed to SQL-lik

query er the generic table, and be ecuted er it. The data units cate gorizer is the most important component of our system. It constructs and maintains virtual relations for representing data units published to the system of arious types. First of all, we need answer fundamental problem what is qualified virtual relation in our collaborati pub- lishing and searching system? Although the ultimate answer is subjecti e, we de vise se eral objecti heuristic metrics described in the follo wing. First, we test if there is suf ficient dominant attrib utes in the virtual relation, i.e.,

the attrib utes
Page 4
for which dominant number of tuples in it ha non- nul alues. Second, we test if there is dominant tags, i.e., the tags used by dominant number of tuples. Third, we check whether the erage similarity alue between the attrib utes of the virtual relation and the attrib utes of each tuple in it is abo predefined threshold. Our ypothesis is that data units with both similar attrib utes and tags are lik ely to ha similar topics and can be queried together i.e., er the same relation schema. Therefore, data units published to the system are assigned to dif ferent

virtual relations based on both their structures and their topics, i.e., we cluster the data units from tw aspects. First, the data units grouped together should ha similar structure the are described with roughly similar attrib utes to be represented with uniform relational schema. Second, the data units in virtual relation should describe similar topics. Specifically we need to discuss se eral problems, which are presented in the follo wing sections. A. Analysis of data unit and virtual elation In our data model, the schema of relation is determined by the attrib utes of the tuples

stored in it (in constrast to con entional settings where the schema is designed by domain xperts priori). Therefore, our task is to disco er structure among incoming data units, and form relational schemas for groups of data units with similar topics. Repr esentation of featur es of data units: The features of data unit include its attrib utes and tags. The attrib utes and tags of data unit are both te xtual, which suggests IR related rep- resentation of the features. emplo the con entional ector space model [12] for representing both tags and attrib utes as ba gs of wor ds Each data unit is

associated with tw feature ectors: attrib utes vector AV and ta gs vector ). The dimensions of the ectors are the unique attrib utes and tags of all the data units already stored in the system. The elements are set to if the corresponding attrib utes or tags appear in the data unit, and otherwise. Repr esentation of featur es of virtual elations: From the aspect of relation, the features of virtual relation are its schema elements the set of attrib utes, and the content of its tuples the set of tags of its tuples. Similarly the can also be represented with the ector space model as tw feature

ectors AV and The elements are possibly related to tw actors. The first is the popularity of the corresponding attrib utes or tags. Gi en relation and its instance the popularity of attrib ute is defined as popul ar ity jf nul gj jf gj and the popularity of tag tag is popul ar ity tag jf match tag s;tag gj jf gj The popularity alues of attrib utes and tags sho their impor tance within virtual relation. The second actor is called the in ver se elation fr equency (IRF) alues for attrib utes and tags in virtual relation. The IRF alue is an analog to the IDF statistics in con entional

information retrie al literature. It is defined as the ratio between the number of virtual relations ha ving the attrib ute/tag to the total number of virtual relations in the system. In our current implementation, we use the popul ar ity alue to measure the importance of an attrib ute/tag in virtual relation. Gr owing vocab ularies: Since the data units are being added continuously the ocab ularies of both tags and attrib utes gro er time, and the virtual relations are also changing. This af fects the corpus-le el statistics such as IRF it changes er time as ne data units are added,

which may ha impact on term weighting and the clustering task. [6] mentions tw solutions for this problem. The first approach is to use such statistics from another similar corpus. This is not practical in our conte xt, since we cannot find an xisting corpus that can be presumed to be similar to the data in our system. The second approach is to estimate the IRF statistics dynamically as data units are added to the system continuously Initially the alues may not be accurate. Ho we er what we care about is whether the IRF alues can con er ge quickly as more and more data units are

added to the system. conducted an initial xperiment on this issue. Figure plots the changes of IRF statistics of attrib utes and tags measured with mean squared error as more data units are added and clustered. It can be seen from the figure the the attrib utes IRF statistics con er ges more quickly than that of the tags statistics. belie this is because its ocab ulary olume is lar ger than that of the attrib utes. Currently we tak the second approach to estimate the IRF statistics. 0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 0.1 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000

11000 Mean squared error of IRF statistics Number of data units added tag attribute Fig. 3. Changes of IRF statistics. Discussion: According to our xperimental data the dis- trib ution of the attrib utes and tags xhibits po wer -la lik distrib ution, as sho wn in Figure ?? and Figure ?? More research need to be done on ho to select the proper portion of ocab ularies of tags and attrib utes as the features of the ectors. As can be seen from the figures, there is lar ge portion of attrib utes or tags ha ving ery small frequenc which may be tri vial and need not be selected as features of

virtual relation. In addition, there are problems with the uncontrolled o- cab ulary of attrib utes and tags contrib uted by users, such as Our xperimental data set contains er 11000 data units that are xtracted from the cra wled data from Google Base.
Page 5
synon yms, plo ysemy superordinate, subordinate relationships, and “noises among the attrib utes or tags. It is challenging task to detect the occurrences of these problems and xtract useful features from data units. 10000 1000 100 10 10000 1000 100 10 frequency attributes 10000 1000 100 10 100000 10000 1000 100 10 frequency

tags Fig. 4. Distrib ution of attrib utes and tags. B. Similarity measur consider tw kinds of similarities between data units and The first is the structur al similarity determined by the similarity between their attrib utes ectors. The second is the topic similarity based on the erlapping of their tags ectors. Considering that the features xtracted from data unit are te xtual-based, which is similar to the document ectors in standard information retrie al model, we apply the cosine similarity measure widely used in IR to measure the similar ities between both pair of AV ectors and pair

of ectors. That is sim AV AV AV AV AV AV AV AV sim The similarity between data unit and virtual relation, and between tw virtual relations are defined similarly C. Clustering method Since the data units are added to the system incrementally er the time, our clustering method should also be incre- mental and adapti to dynamic changes in increasing data olume and topics. consider the incremental clustering model presented in [7 ]. Based on our data model, it can be described as the follo wing. Suppose the data units published to the system are in sequence collection of up to virtual

relations is maintained such that as each ne data unit is presented, either it is assigned to one of the current virtual relations, or ne virtual relation is created “storing it as the only tuple, based on the comparison of the similarities between it and all the virtual relations. If the result number of virtual relations xceeds tw xisting virtual relations are mer ged into one. ith tw types of similarity measures between data units, we use tw o-phase incremental clustering method that deplo ys the incremental clustering model abo at each phase for our clustering task, i.e., the data units

are clustered firstly based on one similarity measure, and then clustered into finer groups based on the other similarity measure. Therefore, the first problem is to decide the order of using the tw kinds of similarity measures. observ ed in Figure ?? and Figure ?? that the attrib utes ocab ulary has much smaller size than the tags ocab ulary and in general the same attrib utes appear more frequently in dif ferent data units than that of the tags. This suggests that we should cluster data units according to their structural similarity in the first phase, which may

result in fe wer clusters that are lar ge in size. Then we further cluster data units into finer groups based on their topic similarity in the second phase. distinguish the clusters formed in the tw phases, we call the clusters formed at phase as tier virtual elation and clusters formed at phase as tier virtual elation tier virtual relation consists of set of tier virtual relations. ier virtual relations are the final virtual relations bro wsed by the user Basically when ne data unit is added to the system, the follo wing steps may tak place: 1) Extract the feature ector of 2)

Compare the structural similarity between and each of xisting tier virtual relations. Choose the virtual relation that has the maximum similarity alue with 3) If the maximum similarity alue xceeds predefined threshold, assign the data unit to the virtual relation as ne tuple. 4) If none of the xisting virtual relation has similarity alue greater than predefined threshold, create ne virtual relation and assign the ne data unit to it as the only tuple. 5) If the number of virtual relations xceeds predefined number choose tw most similar virtual relations and mer ge them into ne

virtual relation. When data unit is assigned to tier virtual relation it is further clustered to tier virtual relation within in ay similar to the steps described abo e, ut based on the topic similarity measures. Se eral parameters need be adjusted by empirical study 1) The number of tier virtual relations and the number of tier virtual relations 2) Structural similarity threshold Only when the struc- ture similarity between data unit and virtual relation,
Page 6
or tw virtual relations is higher than the tw are considered as match. 3) opic similarity threshold It determines whether

the topic similarity between data unit and tier virtual relation is high enough to assign the data unit to the tier virtual relation. 4) Frequenc thresholds and Instead of using all ocab ularies, we consider selecting the attrib utes or tags whose frequenc is between and as the features of data unit. Discussion: Currently we assume each data unit is assigned to only cluster may consider allo wing data unit to be assigned to multiple virtual relations. This will mak the problem more comple x. Also, there are issues to be considered on ho to adjust the clusters when obsolete data units are

deleted from the generic table. In addition, our current approach treats the attrib utes and tags independently when cate gorizing them. It might be useful to detect the association patterns among attrib utes and tags in order to mak the clustering process more ef fecti e. xpect the generic table to contain thousands of at- trib utes, and the tuples in it to be sparse the ha nul alues in most of the attrib utes. This poses challenge to the ph ysical storage scheme for the generic table. are still aluating arious storage strate gies, including the interpreted storage method for relational

tuples proposed in [5 ]. ith interpreted storage, only non- nul alues are actually stored. Each tuple is stored as list of non- nul alues associated with their attrib ute identifiers. handle dynamism of of data composites and huge amount of data, we need method that is scalable er the number of attrib utes and data olume. The storage manager is also in char ge of maintaining the mapping between each virtual relation and Al nits which is query er the instance of Al nits i.e., 7! Al nits As data items are assigned to need be modified as well. When the mapping between the attrib utes

of to the attrib utes of Al nits is one-to-one, is simple projection er subset of attrib utes of Al nits that matched with the attrib utes of respecti ely There could also be the case that an attrib ute of is mapped to multiple attrib utes of Al nits when the multiple attrib utes are detected as synon yms. In this case, is union er multiple projections er dif ferent combinations of the attrib utes of Al nits according to the attrib ute mappings. At current stage, we only consider simple one-to-one attrib ute mappings. will in estig ate the issues with one-to-man mappings at later time. Users

can look for interesting information by bro wsing the schemas of the virtual relations. The virtual relations are ordered decreasingly according to the number of tuples the contained for the user to bro wse. The query processor supports both simple yw ord queries and structured queries posed er the virtual relations. y- ord query can be posed to particular virtual relation, or to the whole system. fullte xt in erted inde is maintained for associating each of the yw ords with the name of the attrib ute, the id of the tuple, and the virtual relation it appears in. yw ord query is processed by

looking up the inde and locating the positions of the rele ant tuples in the generic table. Structured queries are processed by reformulating the queries posed er the virtual relations to that of the generic table according to the mappings between them, and ecuted er it. ha presented our design and implementation for collaborati publishing and searching system. de vise data model for representing and storing the data units. also present our approach for each system component and discuss the research challenges. Currently our focus in on meaningfully clustering and ef fecti querying processing

on the data units. Our ongoing ork includes designing of ef ficient inde es on the virtual relations and data units to acilitate ef ficient retrie al, and ef ficient and adapti query processing strate gies since the less constraints imposed on the storage increased the comple xity of query processing. One of the opportunities we see in such generic storage structure is data sharing er P2P system. intend to adopt it onto our BestPeer [11] P2P platform after we ha resolv ed arious technical issues on centralized serv er [1] Delicious website. http:http://del.icio.us/. [2]

Flickr website. http://www .flickr .com/. [3] Google base website. http://base.google.com/. [4] Li eplasma website. http://www .li eplasma.com/. [5] J. L. Beckmann, A. Halv erson, R. Krishnamurth and J. Naughton. Extending RDBMSs to support sparse datasets using an interpreted attrib ute storage format. In ICDE 2006. [6] J. Callan. Document filtering with inference netw orks. In SIGIR ’96: Pr oceedings of the 19th annual international CM SIGIR confer ence on Resear and de velopment in information etrie val 1996. [7] M. Charikar C. Chekuri, Feder and R. Motw ani. Incremental

clustering and dynamic information retrie al. In Symposium on Theory of Computing 1997. [8] R. agin, A. O. Mendelzon, and J. D. Ullman. simplied uni ersal relation assumption and its properties. CM ans. Database Syst. 7(3), 1982. [9] S. Golder and B. A. Huberman. The structure of collaborati tagging systems. echnical report, Information Dynamics Lab, HP Labs, 2005. [10] D. Maier J. D. Ullman, and M. ardi. On the foundations of the uni ersal relation model. CM ans. Database Syst. 9(2), 1984. [11] S. Ng, B. C. Ooi, and K. L. an. BestPeer: self-configurable peer -to-peer system. In ICDE

2002. Poster aper [12] G. Salton. Dynamic document processing. Communications of the CM 17(7):658–668, 1972.