Intelligent web monitoring - A hypertext mining-based approach

Manas A Pathak, Vivek S Thakre

Abstract



The World Wide Web has become one of the principal sources of information since its inception. With large amount of content added and deleted, the amount of change in hypertextual data is massive. This rapidly changing nature of the WWW makes the task of tracking information intractable when done manually. In this paper we propose an approach for intelligently monitoring the website for changes, taking into consideration the user interests and ranking of these changes according to relevance. A prototype system WebMon based on this approach is presented.

WebMon consists of basic components performing infrastructural activities such as crawlers and indexers. Also it takes as input keyword weights based on the user interests. It then represents the hypertextual data in the website in the form of a vector space model (VSM). Periodically this process is carried out to get the VSM representing the hypertextual data of the website at that instance of time. To monitor for changes, the data in VSMs at different instances of time is compared and the corresponding changes are ranked according to their relevance according to the user. A modified nearest neighbor algorithm (NN) is implemented for the same. To further improve the accuracy and self-adjustability of the relevance rankings, the system employs a modified supervised learning
algorithm thereby taking into account the behavior of the user intelligently. The WebMon system has been tested extensively on many websites giving results as expected. In this paper we report some experimental results showing the effectiveness of the proposed approach.

Keywords


Relevance ranking; vector space model; nearest neighbors; supervised learning

Full Text:

PDF

Refbacks

  • There are currently no refbacks.