Incremental Learning Method for Data with Delayed Labels

keywords: Delayed labels, transfer learning, concept drift, incremental learning, credit scoring
Most research on machine learning tasks relies on the availability of true labels immediately after making a prediction. However, in many cases, the ground truth labels become available with a non-negligible delay. In general, delayed labels create two problems. First, labelled data is insufficient because the label for each data chunk will be obtained multiple times. Second, there remains a problem of concept drift due to the long period of data. In this work, we propose a novel incremental ensemble learning when delayed labels occur. First, we build a sliding time window to preserve the historical data. Then we train an adaptive classifier by labelled data in the sliding time window. It is worth noting that we improve the TrAdaBoost to expand the data of the latest moment when building an adaptive classifier. It can correctly distinguish the wrong types of source domain sample classification. Finally, we integrate the various classifiers to make predictions. We apply our algorithms to synthetic and real credit scoring datasets. The experiment results indicate our algorithms have superiority in delayed labelling setting.
mathematics subject classification 2000: 68T10, 68U35
reference: Vol. 41, 2022, No. 5, pp. 1260–1283