An Efficient Method of Summarizing Documents Using Impression Measurements
keywords: Impressive expressions, NMF methods, precision, relevancy
Automatic generic document summarization based on unsupervised schemes is a very useful approach because it does not require training data. Although techniques using latent semantic analysis (LSA) and non-negative matrix factorization (NMF) have been applied to determine topics of documents, there are no researches on reduction of matrix and speeding up of computation of the NMF method. In order to achieve this scheme, this paper utilizes the generic impressive expressions from newspapers to extract important sentences as summary. Therefore, it has no stemming processes and no filtering of stop words. Generally, novels are typical documents providing sentimental impression for readers. However, newspapers deliver different impressions for new knowledge because they inform readers about current events, informative articles and diverse features. The proposed method introduces impressive expressions for newspapers and their measurements are applied to the NMF method. From 100 KB text data of experimental results by the proposed method, it turns out that the matrix size reduces by 80 % and the computation of the NMF method becomes 7 times faster than with the original method, without degrading the relevancy of extracted sentences.
reference: Vol. 32, 2013, No. 2, pp. 371–391