What is Your Code Clone Detection and Evolution Research Made Of?
keywords: Code clone detection, clone evolution, scientific workflows, building blocks, experiemental testbed
Over the past few decades, clone detection and evolution have become a major area of study in software engineering. Clone detection experiments present several challenges to researchers such as accurate data collection, selecting proper code detection algorithms, and understanding clone evolution phenomena. This paper attempts to facilitate clone detection and evolution research by providing a structured and systematic mechanism to conduct experiments. Clone detection experiments usually consist of several tasks such as fetching data from a version control system, performing necessary pre-processing activities, and feeding the data to a clone detection algorithm. Therefore, a particular clone detection experiment can interpret as a meaningful combination of such tasks into a scientific workflow. In this work, the concrete tasks in a code clone detection workflow are referred to as Building Blocks. This paper presents a useful collection of Building Blocks identified based on a systematic literature review, and a conceptual framework of an experimental testbed to facilitate clone detection experiments. The reusability of the Building Blocks was validated using four case studies selected from the literature. The validation results confirm the reusability and the expressiveness of the Building Blocks in new ventures. Besides, the proposed experimental testbed is proven beneficial in conducting and replicating clone detection experiments.
reference: Vol. 40, 2021, No. 3, pp. 690–728