/
The use of an intelligent forum crawler for data retrieval The use of an intelligent forum crawler for data retrieval

The use of an intelligent forum crawler for data retrieval - PowerPoint Presentation

pamella-moone
pamella-moone . @pamella-moone
Follow
387 views
Uploaded On 2016-04-22

The use of an intelligent forum crawler for data retrieval - PPT Presentation

Miloš Pavković and J elica P roti ć University of Belgrade School of Electrical Engineering Belgrade Serbia 6th International Conference on Education and New Learning ID: 289043

forums forum thread fcbre forum forums fcbre thread post attach university crawler threads plugin link site database simil search

Share:

Link:

Embed:

Download Presentation from below link

Download Presentation The PPT/PDF document "The use of an intelligent forum crawler ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.


Presentation Transcript

Slide1

The use of an intelligent forum crawler for data retrieval from e-learning portals

Miloš Pavković and Jelica Protić, University of BelgradeSchool of Electrical Engineering, Belgrade, Serbia

6th International Conference on Education and New

Learning Technologies Barcelona, 7th - 9th of July 2014Slide2

Introduction

A large number of forums with different topicsForums are often used by students during their studies Large number of relevant information scattered around different forums inside one university domainForums are based on different technologies

2Slide3

Issues

The same topic can appear across different forums inside one university domainSchool official forums VS. departments independent forumsSame documents can be uploaded as post attachments to a couple of different web forumsSimilar courses at different schools

3Slide4

Solution – Specialized crawler

Specialized forum crawlerAggregation of crawled data from multiple forums of a single university domainStoring data into databaseForum modules that use this database for helping students4Slide5

Forum structure

Always defined by presented implicit paths5

Example of a) forum b) thread c) attachments inside post.Slide6

Crawler algorithm

FCbRE – Forum Crawler based on Regular ExpressionsAutomated systemIdentifying DOM structure and basic forum elements with regular expressions.Identifying forum implicit paths using regexExample: >>index\.php\?showforum\==\digit+!>+>\P=!<+Extraction of post content and storing

into the database6Slide7

Crawler database

Essential in FCbRE modelForum threads and posts are separately storedSimilarity tables that contain unique pairs of identifiers of forums, threads and attachments7

Forums

+ site id

- forum id

- forum name

- forum link

Threads

+ forum id

- thread id

- thread name

- thread link

Posts

+ thread id

- post id

- post info

Attach

+ post id

- attach id

- attach name

- attach link

Web Forum

- site id

- site name

- site link

F – Simil.

+ forum id (1)

+ forum id (2)

T – Simil.

+ thread id (1)

+ thread id (2)

F/T – Simil.

+ forum id

+ thread id

A – Simil.

+ attach id (1)

+ attach id (2)Slide8

Finding similarities

Determining similarities of forums, threads or document namesIt is not enough to just compare the wordsgrammatical errorsSingular/plural formdifferent form but the same semantic meaningUsing existing search engines

to distinguish semanticsFCbRE uses low-level semantic difference8Slide9

Module plugins

Two module pluginsFCbRE-S (FCbRE Search plugin )FCbRE-DP (FCbRE Duplicate Prevention plugin)Both used for experimental purposesWritten for vBulletin technologyCan be adopted for any other forum technology

9Slide10

FCbRE-S (FCbRE Search plugin )

Designed for standard forums searchesForwards the requested query to FCbRE database for similarity comparisonAll similarities are shown as addition to standard search results

10Slide11

FCbRE-DP (Duplicate Prevention

plugin)Implemented in the section where the users can create a topic or forumMonitors the field for the name of new thread or forumNotifies the user that the similarity exist

11Slide12

Results

9 web forums from the University of Belgrade, manually gatheredThis group is a mixture from different sources Percentage of similar forums is smallest, while for the document is highestTrue percentage of "useful" duplicates should be taken with caution

12Slide13

Conclusion

The proposed solution performs information aggregation of related forums It has potential in reducing duplication of forums, topics and posts The use of plugins would result in higher forum content

quality13Slide14

Thank you!

14

Feel free to contact us and ask any question that you may find interesting

m

ilos_pavkovic@yahoo.com

jeca@etf.bg.ac.rs