Compact Indexes Based on Core Content in Personal Dataspace Management System

keywords: Keyword query, indexing, result quality, semantic analysis, personal dataspace management system
A Personal DataSpace Management System is a platform to manage personal data with heterogeneous data types, in which keyword query is a primary query form for users who know little about the structure of the dataspace. Unlike exploratory queries in web search, a user in a personal dataspace usually has a specific search target and wants to find some known items in mind. To improve result quality in terms of query relevance in a personal dataspace, we propose the concept of compact index in this paper. We refer to the most important and representative semantics from documents as core content, and build compact index on it. We propose algorithm for selecting core content from a document based on semantic analysis, which can process English and Chinese documents uniformly. Furthermore, a software platform named Versatile is introduced for flexible personal data management, in which core content is extracted for building compact indexes and generating query-biased snippet efficiently and accurately. Finally, extensive experiments have been conducted to show the effectiveness and feasibility of compact indexes in personal dataspace management system.
reference: Vol. 33, 2014, No. 2, pp. 281–302