An Algorithm for the Generation of Segmented Parametric Software Estimation Models and Its Empirical Evaluation

keywords: Parametric software estimation, software project databases, clustering algorithms, EM algorithm
Parametric software effort estimation techniques use mathematical cost-estimation relationships derived from historical project databases, usually obtained through standard curve regression techniques. Nonetheless, project databases -- especially in the case of consortium-created compilations like the ISBSG --, collect highly heterogeneous data, coming from projects that diverge in size, process and personnel skills, among other factors. This results in that a single parametric model is seldom able to capture the diversity of the sources, in turn resulting in poor overall quality. Segmented parametric estimation models use local regression to derive one model per each segment of data with similar characteristics, improving the overall predictive quality of parametrics. Further, the process of obtaining segmented models can be expressed in the form of a generic algorithm that can be used to produce candidate models in an automated process of calibration from the project database at hand. This paper describes the rationale for such algorithmic scheme along with the empirical evaluation of a concrete version that uses the EM clustering algorithm combined with the common parametric exponential model of size-effort, and standard quality-of-adjustment criteria. Results point out to the adequacy of the technique as an extension of existing single-relation models.
reference: Vol. 26, 2007, No. 1, pp. 1–15