Web Person Name Disambiguation Using Social Links and Enriched Profile Information

Hojjat Emami

Social Network and Intelligent Systems Laboratory Department of Information and Communication Technology (ICT) Malek-Ashtar University of Technology Tehran, Iran
Hossein Shirazi

Social Network and Intelligent Systems Laboratory Department of Information and Communication Technology (ICT) Malek-Ashtar University of Technology Tehran, Iran
Ahmad Abdollahzadeh Barforoush

Intelligent Systems Laboratory Computer Engineering and IT Department Amirkabir University of Technology Tehran, Iran

Web Person Name Disambiguation Using Social Links and Enriched Profile Information

keywords: Web mining, cross-document name disambiguation, social links, profile enrichment, clustering

In this article, we investigate the problem of cross-document person name disambiguation, which aimed at resolving ambiguities between person names and clustering web documents according to their association to different persons sharing the same name. The majority of previous work often formulated cross-document name disambiguation as a clustering problem. These methods employed various syntactic and semantic features either from the local corpus or distant knowledge bases to compute similarities between entities and group similar entities. However, these approaches show limitations regarding robustness and performance. We propose an unsupervised, graph-based name disambiguation approach to improve the performance and robustness of the state-of-the-art. Our approach exploits both local information extracted from the given corpus, and global information obtained from distant knowledge bases. We show the effectiveness of our approach by testing it on standard WePS datasets. The experimental results are encouraging and show that our proposed method outperforms several baseline methods and also its counterparts. The experiments show that our approach not only improves the performances, but also increases the robustness of name disambiguation.

mathematics subject classification 2000: 97R40, 97R50,68T50, 68U35, 90B40

reference: Vol. 37, 2018, No. 6, pp. 1485–1515

doi: 10.4149/cai_2018_6_1485

Computing and Informatics

formerly Computers and Artificial Intelligence

Web Person Name Disambiguation Using Social Links and Enriched Profile Information