/
International Journal of Emerging Trends & Technology in Computer Scie
International Journal of Emerging Trends & Technology in Computer Scie

International Journal of Emerging Trends & Technology in Computer Scie - PDF document

ellena-manuel
ellena-manuel . @ellena-manuel
Follow
80 views | Public

International Journal of Emerging Trends & Technology in Computer Scie - Description

Web Site wwwijettcsorg Email editorijettcsorg editorijettcsgmailcom Volume 3 Issue 2 March ID: 308217 Download Pdf

Tags :

Uncharted 4 Web Site: www.ijettcs.org Email: editor@ijettcs.org

Share:

Link:

Embed:

Please download the presentation from below link :


Download Pdf - The PPT/PDF document "International Journal of Emerging Trends..." is the property of its rightful owner. Permission is granted to download and print the materials on this web site for personal, non-commercial use only, and to display it on your personal computer provided you do not modify the materials and that you retain all copyright notices contained in the materials. By downloading content from our website, you accept the terms of this agreement.

Presentation on theme: "International Journal of Emerging Trends & Technology in Computer Scie"— Presentation transcript

International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March – April 2014 ISSN 2278 - 6856 Volume 3, Issue 2 Ma rch – April 2014 Page 208 Abstract: As the web is growing v ery rapidly, Search Engines play a chief role in retrieving data from web. When we search for a topic in a web, it presents hundreds of search results [6]. It is highly impossibly to visit all the web pages to find relevant information. Page rank algorithm s play a chief role to make navigation easier to the user. There are two drawbacks of traditional page rank, first is necessity to have access to entire of web structure to perform proper computation and secondly a high weight web page references is other web page of low quality and thereby increases its ranking position. We focus mostly on the second issue of its drawback in this paper. This paper proposes application of ant colony algorithm for ranking web pages called AntRank (Ant co lony based ranking algorithm).The goal of AntRank is to assign a rank for web pages in a search engine inspired by behavior real ant colonies. Artificial ant visits randomly one by one web page, based on pheromone (user interest), assign rank for web pages . Keywords Web mining, Page rank, Ant colony, Search Engine. 1. INTRODUCTION Web - Mining is a process of mining data present in the World Wide Web database in the form of web pages. Based on the need of mining process, Web - Mining can be subdivided into the three types named Web structure mining, Web content mining and Web usage mining. Web structure mining operates on hyperlink structure of the Web. Web content mining extracts useful information from page contents and Web usage mining extract knowledge from the usage patterns i.e. visitor traffic information ,queries, related clicks etc that are people information when interact with web. [1] Search Engine is a software system that is designed to retrieve information from the World Wide Web (Wikip edia). As on today WWW is the largest information repository for knowledge reference. Finding high quality web pages is challenging issue in any web search engine. Quality of pages is defined based on the user preferences. The problem of ranking is to disp lay result list based on user’s request or preferences [11] .To make the web more interesting and productive, we need a good and efficient ranking algorithm for searching. We propose a ranking algorithm inspired by the behavior of real ant colonies. In t his algorithm ranks for web pages are assigned according to users’ interests to click on web link. In traditional approaches search engine always returns same ranks for same query submitted at different times or by different users[12].In this algorithm values on the web pages will not be the same all time, as users’ interests in the page might vary or change. Table - 1: Present Search Engine Scenario [5] 2012 2013 %Change Portals and Search Engines 79 76 - 3.8 % Google lnc. 82 77 - 6.1% Bing (Microsoft Corporation) 81 76 - 6.2% Yahoo! Lnc 78 76 - 2.6% MSN (Microsoft Corporation) 78 74 - 5.1% AOL LLC (Time Warner lnc) 74 71 - 4.1% (As per Foresee Results for the American Customer Satisfaction Index [ACSI]) The above is the survey results of Foresee on u ser satisfaction of search engine relevance. The user’s expectations are outpacing the experiences with Search engines. We can clearly observe the change of satisfaction values of users on various search engines which includes a drop in Google’s percentage value also. Hence Users interest has to be given priority or importance. In view with this aspect, we proposed the AntRank algorithm. This algorithm takes the User interest into consideration and retrieves the results for a given query. Page rank algorit hm considers the importance of the cited web page which is said as the main advantage of Google search engine [8]. The main reason for loosing relevance in the retrieved data (as shown in the Foresee results of ACSI) is due to its reference concept of we b pages. So, to avoid this problem and achieve relevance in the search results we follow Ant colony Optimization technique. The rest of this paper organized as follows: second section presents brief overview of Page rank algorithm, Ant Colony, 3rd Secti on introduce the Ant colony Algorithm for web page ranking (AntRank). Section 4 explains implementation of proposed method. 2. BACKGROUND In this section explains Page Rank algorithm and ant colony algorithm. 2 .1 Page Rank Algorithm [2] : The Page Rank a lgorithm presented by Brin, Page et’al is one of the factors used by Google to calculate the relative importance of the web pages. The Page Rank ANTRANK: AN ANT COLONY ALGORITHM FOR RANKING WEB PAGES G. Anuradha 1 , G. Lavanya Devi 2 and M.S Prasad Babu 3 1 Research Scholar, Dept. of CS&SE, Andhra University, Visakhapatnam , India 2 Assistant Professor, Dept. of CS&SE, Andhr a University, Visakhapatnam, India 3 Professor, Dept. of CS&SE, Andhra University, Visakhapatnam , India International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March – April 2014 ISSN 2278 - 6856 Volume 3, Issue 2 Ma rch – April 2014 Page 209 value of a Web page depends on the Page Rank values of pages pointing to it and on the number of links going out of these pages. In this algorithm those web pages with more citations are more important. The of the advantage Page Rank is that it does not only depend on the count of referrals, but also considers the impor tance of the cited web page. We assume page A has pages T1...Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. Also C (A) is defined as the number of links going out of page A. The Page Rank of a page A is given as follows: PR(A) = (1 - d) + d (PR(T1)/C(T1) + ... + PR (Tn) /C (Tn)) Page Ranks form a probability distribution over web pages, so the sum of all web pages' Page Ranks will be one. or PR(A) can be calcul ated using a simple iterative algorithm, and corresponds to the principal eigenvector of the normalized link matrix of the web. 2.2 ANT COLONY: Swarm Intelligence is a field of computer science that designs and studies efficient computational methods for s olving problems in a way that is inspired by the behavior of real swarms or insect colonies (see e.g. Bonabeau et al., 1999; Kennedy et al, 2001).The main area of swarm intelligence that are relevant for such problems are ant colony optimization. Ant colon y Algorithm is inspired by the social life of ants, Individual ants are un - intelligent and they are practically blind but with their social structure [10], ants complete the complex task of finding the shortest path without knowing problem’s existence. The ants have a place where they store their food called nest. When they sense some food nearby their task is to carry the food to the nest. Ants leave behind a chemical substance called pheromone which lets the other ants identify that an ant has been there before. The amount of pheromone that an ant deposits is inversely proportional to the distance it has travelled. So the ants that move along smaller paths secrete more amount of pheromone per unit length[3]. This will be shown in Fig1 Fig 1: The Ant’s w ay 3. PROPOSED ALGORITHM : Initially virtual ants (users) visit web page randomly, Some pheromone quantity is associated with each web page [7]. Here pheromone is user interest on the web page which is calculated using clicks on web link. The Pheromone qu ality changes dynamically along with the user visits to the web page. A virtual ant (user) decides to visit web page based on pheromone quantity and small description about web page. Pheromone of each web page decrease by a constant factor with time ANT ta kes decision randomly where the probability equals to pheromone relative to the sum of all pheromone values, the Equation for Probability of user (ant)choosing a probability web page i is : For problems, additionally, proble m dependent heuristic information can be used to give the ant additional hints about which item to choose next. To each pheromone value a corresponding heuristic value is defined. In Antrank heuristic [9]is attractiveness of sm all description of web page that is provided in the result list for given keyword : pheromone value is updated for every trail segment visited by the ant(user), with time the relevant web page for particular keyword to be added to rule, will have greater and greater amount of user clicks(Pheromone) ,increasing their probability of being chosen. Heuristic value based on the attractiveness of web page for visit For given keyword k, all possible web pages set is S. Pheromone Updating: whenever ant visits the webpage the amount of pheromone is updated. Amount of Pheromone is increased proportional to the quality of web page, Pheromone value at time t , amount of Pheromone saved at time t ,left by ant. It can be changed depends on user interests of clicks . 3.1 ANTRANK ALGORITHM: Algorithm: 1. Initialization of values S=0; t= current time ; t[]=null; =100 initial pheromone value (say 100 is thres hold value) T= time stamp(consider 15 days) 2. t[]=tokens(k) 3. while (each(t[]) matches with URL || meta) Display as hyperlink to the webpage; Increment S value until end of web pages; 4. end while 5. set of web pages that are matches for given keyword S ={1,2,3,….,n } 6. for all S ={1,2,…..,n} International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March – April 2014 ISSN 2278 - 6856 Volume 3, Issue 2 Ma rch – April 2014 Page 210 7. choose web page i with probability 8. repeat 9. choose next web page j ε s with probability until s=null 10. Procedure after probation t[]=tokens(k) while (each (t[]) matches with URL || meta) Consider pheromon e updating that is calculated as initial value + current user interest (ie Ant puts the Pheromone on the web Page). 11. Calculate pheromone evaporation 12. For choosing relevant web page do Consider pheromone updating and pheromone evaporation Display as hype rlink to the webpage 13. repeat step 10 to 12 4. IMPLEMENTATION OF PROPOSED SYSTEM : The Proposed algorithm has been implemented on a Data set of 5000 Urls. The Url’s have been collected from various categories (ref. Alexa.com). The implementation w as done in two phases. Phase - 1 : Calculation of Users interest. Phase - 2: Assigning Ranking for Web Pages using Users Interest. Phase - 1 : Calculation of Users interest. The 5000 Urls are collected and hosted on www.goongo.in server named www.goongo.in. www.Go ongo.in was popularized by using social media and oral advertising. The time stamp of 15 days was considered. The user’s interest for every 15 days of usage of www.Goongo.in was observed. The Users interest is referred as Ant in our proposed algorithm. Its calculations has mentioned in section - 3. Phase - 2 : Assigning Ranking for Web Pages using Users interest. Pheromone evaporation and updating are calculated. Based on the Pheromone values Ranking is assigned to the Web Pages. For implementing the ANT algori thm the following are considered as facts of implementation. Table2: shows ANTRank considering the facts as follows: In Algorithm Implementation Ant User Path Web Page Pheromone Interest Fig.2 Shows DATABASE MAINTENANCE International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March – April 2014 ISSN 2278 - 6856 Volume 3, Issue 2 Ma rch – April 2014 Page 211 Fig 3: shows Before Ant colony for given key word “movies” F ig 3: shows After Ant colony for given key word “movies” Table 3: Shows the difference in the search results obtained for the same key word “News”. The contents of the table shows variation in the top five results obtained in page rank and Antrank algorithms 5. CONCLUSION: This paper has proposed an algorithm for ranking web pages called ANTRANK. The goal of this algorithm is to assign rank for web pages based on Users Interest (Ant). The Prop osed algorithm is based on the behavior of real ant colonies. The concept has been utilized for Web mining. Based on Ant colony algorithm pheromone updating is considered as user interests to click on web link. Similarly Pheromone evaporation is considered as lack of interests of the users. After stipulated amount of probation period the weights on web pages will vary drastically. Now after the probation period the actual page ranking will be applied based on - the values of interest on the web page that a re crawled by the engine. However the values on the web pages will not be the same all time. We compare the evaporation of pheromone in ant colony to reducing the user interest on the web page manually and periodically. This algorithm eliminates a high wei ght web page references is other web page of low quality and thereby increases its ranking position . REFERENCE: [1] http://www.deri.ie/sites/default/files/ev ent_local_mat erial/tutorial - webmining.pdf PAGE RANK ANTRank m2newmedia.com ndtv.com newsvine.com thehindu.com bharat - rakshak.com/IAF/ eenadu.net cgitoronto.ca indianexpress.com telecomindiaonline.com screenindia.com International Journal of Emerging Trends & Technology in Computer Science (IJETTCS) Web Site: www.ijettcs.org Email: editor@ijettcs.org, editorijettcs@gmail.com Volume 3, Issue 2, March – April 2014 ISSN 2278 - 6856 Volume 3, Issue 2 Ma rch – April 2014 Page 212 [2] http://www.cs.princeton.edu/chazelle/courses/BIB/pa gerank.htm [3] M. Dorigo and G. Di Caro, “The ant colony optimization meta - heuri stic,” In: New Ideas in Optimization, D. Corne, M. Dorigo and F.Glover Eds. London, UK: McGraw Hill, pp. 11 - 32, 1999. [4] Rafael S.Parpinelli, Heitor S.Lopes, “Data Mining With an AntColony Optimization Algorithm” IEEE Transactionsonevolutionary Computing, AOL .6.No.4,AUGUST 2002 [5] http://www.foresee.com/research - white - papers/_downloads/annual - e - business - report - acsi - 2013 - foresee.pdf [6] Ashis h Jain, Rajeev Sharma, Gireesh Dixit and Varsha Tomar “Page Ranking Algorithms in web Mining, Limitations of Existing methods and a New Method for Indexing Web Pages”. International Conference on Communication Systems and Network Technologies,978 - 0 - 7695 - 49 58 - 3/2013 IEEE. [7] Sara Setayesh, Ali Harounabadi and Amir Masoud Rahmani “ Presentation of an Extended Version of the Page Rank Algorithm to Rank Web Pages Inspired by Ant Colony Algorithm”, International Journal of Computer Applications (0975 - 8887), V olume 85 - No 17, January 2014. [8] Chonawat Srisa - An “Applying Ant Colony Algorithm for Page Ranking Calculation”. NCIT - 2010 [9] Ivan Brezina Jr. and Zuzana Cickova “Solving the Travelling Salesman Problem Using the Ant Colony Optimization”, Management Information Systems, Vol 6 (2011), No. 4, pp.010 - 014 [10] Daniel Merkle and Martin Middendorf “Swarm Intelligence” Department of computer science, University of Leipzig, Germany [11] Ali Mohammad Zareh Bidoki, Nasser Yazdani, Distance Rank: An intelligen t ranking algorithm for web pages, Information Processing and Management 44 (2008) 877 – 892. [12] Songhua Xu,Yi Zhu, Hao Jiang and Francis C.M. Lau, “A User - OrientedWebpage Ranking Algorithm Based on User Attention Time” Proceedings of the Twenty - Third AAA I Conference on Artificial Intelligence (2008), References a. Bonnaccorsi, “On the Relationship between Firm Size and Export Intensity,” Journal of International Business Studies, XXIII (4), pp. 605 - 635, 1992. (journal style) [2] R. Caves, Multinational Enterp rise and Economic Analysis, Cambridge University Press, Cambridge, 1982. (book style) [3] M. Clerc, “The Swarm and the Queen: Towards a Deterministic and Adaptive Particle Swarm Optimization,” In Proceedings of the IEEE Congress on Evolutionary Computation (CE C), pp. 1951 - 1957, 1999. (conference style) [4] H.H. Crokell, “Specialization and International Competitiveness,” in Managing the Multinational Subsidiary, H. Etemad and L. S, Sulude (eds.), Croom - Helm, London, 1986. (book chapter style) [5] K. Deb, S. Agrawal, A. Pratab, T. Meyarivan, “A Fast Elitist Non - dominated Sorting Genetic Algorithms for Multiobjective Optimization: NSGA II,” KanGAL report 200001, Indian Institute of Technology, Kanpur, India, 2000. (technical report style) [6] J. Geralds, "Sega Ends Production of Dreamcast," vnunet.com, para. 2, Jan. 31, 2001. [Online]. Available: http://nl1.vnunet.com/news/1116995 . [Accessed: Sept. 12, 2004]. (General Internet site) AUTHOR S G.Anuradha did her M.Tech from G ITAM college of engineering, Visakhapatnam, Andhra Pradesh. She is working as an Associate professor in the department of CSE in GMRIT, Rajam, and Srikakulam district. She is pursuing Ph.D. in AndhraUniversity on w eb mining . She has got one paper published in International journal, and more than five paper published in various National & international conferences. Lavanya Devi Golagani received B.Tech degree from Nagarjuna University in 2000. She received M.Tech degree from Andhra University in 2002.She is presently working as Assistant Professor in Dept. of Computer science & systems engineering in Andhra University college of Engineering in Visakhapatnam . She has got more than 10 years of teaching experience . She has more than 15 papers published i n various National & nternational journals and more than 20 published in conferences. Her research interest includes Network Security, Data Mining, Bioinformatics Prof. M.Surendra Prasad Babu did his M.Phil and Ph.D. degrees from Andhra University. He is presently working as Professor in Dept. of Computer science & systems engineering in Andhra University college of Engineering in Visakhapatnam. He has got more than 30 years of teaching experience. He has more than 120 papers published in various National & International journals and conferences. His research areas include Artificial Intelligence & Expert Systems Machine Learning, Computer Networks, Neural Networks, and Object Oriented Technologies, web Technologies. Pervasive computing and similarity tra nsformations .