AN IMPROVED PAGERANK ALGORITHM BASED ON A HYBRID APPROACH

PageRank is an algorithm that brings an order to the Internet by returning the best result to the users corresponding to a search query. The algorithm returns the result by calculating the outgoing links that a webpage has thus reflecting whether the webpage is relevant or not. However, there are still problems existing which relate to the time needed to calculate the page rank of all the webpages. The turnaround time is long as the webpages in the Internet are a lot and keep increasing. Secondly, the results returned by the algorithm are biased towards mainly old webpages resulting in newly created webpages having lower page rankings compared to old webpages even though new pages might have comparatively more relevant information. To overcome these setbacks, this research proposes an alternative hybrid algorithm based on an optimized normalization technique and content-based approach. The proposed algorithm reduces the number of iterations required to calculate the page rank hence improving efficiency by calculating the mean of all page rank values and normalising the page rank value through the use of the mean. This is complemented by calculating the valid links of web pages based on the validity of the links rather than the conventional popularity.


A b s t r a c t
PageRank is an algorithm that brings an order to the Internet by returning the best result to the users corresponding to a search query. The algorithm returns the result by calculating the outgoing links that a webpage has thus reflecting whether the webpage is relevant or not. However, there are still problems existing which relate to the time needed to calculate the page rank of all the webpages. The turnaround time is long as the webpages in the Internet are a lot and keep increasing. Secondly the results returned by the algorithm are biased towards mainly old webpages resulting in newly created webpages having lower page rankings compared to old webpages even though new pages might have comparatively more relevant information. To overcome these setbacks, this research proposes an alternative hybrid algorithm based on optimized normalization technique and content based approach. The proposed algorithm reduces the amount of iterations required to calculate the page rank hence improving efficiency by calculating the mean of all page rank values and normalising the page rank value through use of the mean. This is complemented by calculating the valid links of web pages based on the validity of the links rather than the conventional popularity.

R e s e a r c h H i g h l i g h t s
PageRank algorithm is employed in search engines for retrieving the best results for a query by using the structure of the internet graph to rank the importance of a webpage (1). The problems of the PageRank algorithm are the algorithm is favoring the old webpage making new informative webpage become less prioritize and the algorithm required a lot of time to calculate the true page rank of a webpage (2). The aim of this paper is to address the above issues by proposing an enhanced PageRank algorithm that combines optimized normalized technique (3) and content-based approach (4). The reason to use optimized normalized technique is by using the technique it can speed up the convergence process making the page rank to be converging faster than conventional PageRank algorithm to get the final true page rank. Moreover, the content-based approach is letting the rank distribution to become validity manner as to de-bias to old webpages compared to conventional popularity based.

G r a p h i c a l A b s t r a c t
Error! Reference source not found. shows the iterations needed for the proposed PageRank algorithm to calculate the final true page ranks of all webpages. It is apparent that the proposed algorithm is 74% faster than other PageRank algorithms in terms of iteration. Moreover, the proposed algorithm requires only 27 iterations to calculate the final true page ranks of all webpages.

R e s e a r c h O b j e c t i v e s
This research is aim to propose an enhanced algorithm to get the result without favoring to old webpage. This objective is test by ranking position of PageRank algorithm and hence the ranking position obtained by using the proposed algorithm will be compare with the ranking position with the PageRank algorithm. In additon the an enhanced algorithm will decrease the iteration needed to calculate the page rank. This objective is test by the number of iterations needed to calculate the final page rank of all the webpages.

M e t h o d o l o g y
The proposed algorithm starts with calculating the mean of the all page rank value and normalizing the page rank value through the mean. Then calculate the valid links of the webpages. This technique used because of the valid links and the rank distribution will be based

R e s u l t s
The proposed algorithm reduces the number of iterations required to calculate the page rank, hence improves the efficiency, by calculating the mean of all page rank values and normalizes them through the use of the mean. Through this approach, the algorithm is also able to determine the relevancy of webpages based on validity of links rather than popularity. These claims are demonstrated by an experiment conducted on the proposed algorithm using a dummy web structure consisting of 12 webpages. The results showed that the traditional PageRank algorithm has 74% more iterations than the proposed algorithm. The proposed algorithm returned a mean value of 1.00 compared to 1.32 for the traditional algorithm. These results confirm that the proposed algorithm saves a substantial amount of computing power while being more precise and not biased.

Findings
This paper has proposed an alternative hybrid PageRank algorithm based on optimized normalization technique and content-based approach. The proposed algorithm has outperformed other PageRank algorithms in terms of iterations needed and the quality of webpages returned. In term of iterations needed, the proposed algorithm requires less iterations to calculate the final true page ranks of all webpages, thus it becomes faster than other PageRank algorithms. In terms of quality, the proposed algorithm takes into account the valid links of the webpages and hence, it has the same results as the Web Content based PageRank algorithm but different compared to the traditional PageRank algorithm