Incremental Learning Method for Data with Delayed Labels

Haoran Gao

The MOE Key Laboratory of Embedded System and Service Computation Tongji University Shanghai, 201804, China
Zhijun Ding

The MOE Key Laboratory of Embedded System and Service Computation Tongji University Shanghai, 201804, China
Meiqin Pan

School of Business and Management Shanghai International Studies University Shanghai, 200083, China

Incremental Learning Method for Data with Delayed Labels

keywords: Delayed labels, transfer learning, concept drift, incremental learning, credit scoring

Most research on machine learning tasks relies on the availability of true labels immediately after making a prediction. However, in many cases, the ground truth labels become available with a non-negligible delay. In general, delayed labels create two problems. First, labelled data is insufficient because the label for each data chunk will be obtained multiple times. Second, there remains a problem of concept drift due to the long period of data. In this work, we propose a novel incremental ensemble learning when delayed labels occur. First, we build a sliding time window to preserve the historical data. Then we train an adaptive classifier by labelled data in the sliding time window. It is worth noting that we improve the TrAdaBoost to expand the data of the latest moment when building an adaptive classifier. It can correctly distinguish the wrong types of source domain sample classification. Finally, we integrate the various classifiers to make predictions. We apply our algorithms to synthetic and real credit scoring datasets. The experiment results indicate our algorithms have superiority in delayed labelling setting.

mathematics subject classification 2000: 68T10, 68U35

reference: Vol. 41, 2022, No. 5, pp. 1260–1283

doi: 10.31577/cai_2022_5_1260

Computing and Informatics

formerly Computers and Artificial Intelligence

Incremental Learning Method for Data with Delayed Labels