Miloš Pavković and J elica P roti ć University of Belgrade School of Electrical Engineering Belgrade Serbia 6th International Conference on Education and New Learning ID: 289043
Download Presentation The PPT/PDF document "The use of an intelligent forum crawler ..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.
Slide1
The use of an intelligent forum crawler for data retrieval from e-learning portals
Miloš Pavković and Jelica Protić, University of BelgradeSchool of Electrical Engineering, Belgrade, Serbia
6th International Conference on Education and New
Learning Technologies Barcelona, 7th - 9th of July 2014Slide2
Introduction
A large number of forums with different topicsForums are often used by students during their studies Large number of relevant information scattered around different forums inside one university domainForums are based on different technologies
2Slide3
Issues
The same topic can appear across different forums inside one university domainSchool official forums VS. departments independent forumsSame documents can be uploaded as post attachments to a couple of different web forumsSimilar courses at different schools
3Slide4
Solution – Specialized crawler
Specialized forum crawlerAggregation of crawled data from multiple forums of a single university domainStoring data into databaseForum modules that use this database for helping students4Slide5
Forum structure
Always defined by presented implicit paths5
Example of a) forum b) thread c) attachments inside post.Slide6
Crawler algorithm
FCbRE – Forum Crawler based on Regular ExpressionsAutomated systemIdentifying DOM structure and basic forum elements with regular expressions.Identifying forum implicit paths using regexExample: >>index\.php\?showforum\==\digit+!>+>\P=!<+Extraction of post content and storing
into the database6Slide7
Crawler database
Essential in FCbRE modelForum threads and posts are separately storedSimilarity tables that contain unique pairs of identifiers of forums, threads and attachments7
Forums
+ site id
- forum id
- forum name
- forum link
Threads
+ forum id
- thread id
- thread name
- thread link
Posts
+ thread id
- post id
- post info
Attach
+ post id
- attach id
- attach name
- attach link
Web Forum
- site id
- site name
- site link
F – Simil.
+ forum id (1)
+ forum id (2)
T – Simil.
+ thread id (1)
+ thread id (2)
F/T – Simil.
+ forum id
+ thread id
A – Simil.
+ attach id (1)
+ attach id (2)Slide8
Finding similarities
Determining similarities of forums, threads or document namesIt is not enough to just compare the wordsgrammatical errorsSingular/plural formdifferent form but the same semantic meaningUsing existing search engines
to distinguish semanticsFCbRE uses low-level semantic difference8Slide9
Module plugins
Two module pluginsFCbRE-S (FCbRE Search plugin )FCbRE-DP (FCbRE Duplicate Prevention plugin)Both used for experimental purposesWritten for vBulletin technologyCan be adopted for any other forum technology
9Slide10
FCbRE-S (FCbRE Search plugin )
Designed for standard forums searchesForwards the requested query to FCbRE database for similarity comparisonAll similarities are shown as addition to standard search results
10Slide11
FCbRE-DP (Duplicate Prevention
plugin)Implemented in the section where the users can create a topic or forumMonitors the field for the name of new thread or forumNotifies the user that the similarity exist
11Slide12
Results
9 web forums from the University of Belgrade, manually gatheredThis group is a mixture from different sources Percentage of similar forums is smallest, while for the document is highestTrue percentage of "useful" duplicates should be taken with caution
12Slide13
Conclusion
The proposed solution performs information aggregation of related forums It has potential in reducing duplication of forums, topics and posts The use of plugins would result in higher forum content
quality13Slide14
Thank you!
14
Feel free to contact us and ask any question that you may find interesting
m
ilos_pavkovic@yahoo.com
jeca@etf.bg.ac.rs