Enhancing Semantic Web Entity Matching Process Using Transformer Neural Networks and Pre-Trained Language Models

keywords: Entity matching, record linkage, linked data, deep learning, transformer neural networks
Entity matching (EM) is a critical yet complex component of data cleaning and integration. Recent advancements in EM have predominantly been driven by deep learning (DL) methods. These methods primarily enhance data accuracy within structured data that adheres to a high-quality and well-defined schema. However, these schema-centric DL strategies struggle with the semantic web's linked data, which tends to be voluminous, semi-structured, diverse, and often noisy. To tackle this, we introduce a novel approach that is loosely schema-aware and leverages cutting-edge developments in DL, specifically transformer neural networks and pre-trained language models. We evaluated our approach on six datasets, including two tabular and four RDF datasets from the semantic web. The findings demonstrate the effectiveness of our model in managing the complexities of noisy and varied data.
reference: Vol. 43, 2024, No. 6, pp. 1397–1415