Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM

Di Wu

Department of Information and Electronic Engineering Hebei University of Engineering Handan, Hebei, China
Zhuyun Huang

Department of Information and Electronic Engineering Hebei University of Engineering Handan, Hebei, China

Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM

keywords: Danmaku text, short text clustering, feature extension, OBTM, new word discovery

The danmaku text clustering is a hot topic in online video reviews. Given the problem of unsatisfactory clustering accuracy caused by short text and many new words, the danmaku text clustering algorithm based on feature extension and word-pair filtering OBTM is proposed. First, a new-word discovery algorithm based on weight optimization is proposed to retain the features of new words in the danmaku text. Then, the internal information and external knowledge of new words are used to expand the features of the danmaku text for reduced feature sparsity. Furthermore, the OBTM topic model based on word-pair filtering is designed to eliminate noise features. Finally, the Single-Pass algorithm based on cluster center iteration is proposed to obtain the clustering results of topic feature words. Experimental results show that the algorithm proposed in this paper is 13.33 %, 8.52 %, 6.25 % higher than the OBTM, Word2vec+BTM, OurE.Drift* algorithm, respectively, in terms of clustering accuracy.

reference: Vol. 41, 2022, No. 3, pp. 788–812

doi: 10.31577/cai_2022_3_788

Computing and Informatics

formerly Computers and Artificial Intelligence

Danmaku Text Clustering Algorithm Based on Feature Extension and Word-Pair Filtering OBTM