Intelligent Support for Information Retrieval of Web Documents

Robert Kovaľ

Department of Computer Science and Engineering
Slovak University of Technology in Bratislava
Ilkovičova 3, 812 19 Bratislava, Slovakia
Pavol Návrat

Department of Computer Science and Engineering
Slovak University of Technology in Bratislava
Ilkovičova 3, 812 19 Bratislava, Slovakia

Intelligent Support for Information Retrieval of Web Documents

keywords: Intelligent information retrieval, suffix tree clustering algorithm, clickstream analysis, web tool, search agent

The main goal of this research was to investigate the means of intelligent support for retrieval of web documents. We have proposed the architecture of the web tool system --- Trillian, which discovers the interests of users without their interaction and uses them for autonomous searching of related web content. Discovered pages are suggested to the user. The discovery of user interests is based on analysis of documents visited by the users previously. We have created a module for completely transparent tracking of the user's movement on the web, which logs both visited URLs and contents of web pages. The post analysis step is based on a variant of the suffix tree clustering algorithm. We primarily focus on overall Trillian architecture design and the process of discovering topics of interests. We have implemented an experimental prototype of Trillian and evaluated the quality, speed and usefulness of the proposed system. We have shown that clustering is a feasible technique for extraction of interests from web documents. We consider the proposed architecture to be quite promising and suitable for future extensions.

reference: Vol. 21, 2002, No. 5, pp. 509–528

Computing and Informatics

formerly Computers and Artificial Intelligence

Intelligent Support for Information Retrieval of Web Documents