Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning

Tajbia Hossain

Department of Computer Science and Engineering Ahsanullah University of Science and Technology Dhaka, Bangladesh
Humaira Zahin Mauni

Department of Computer Science and Engineering Ahsanullah University of Science and Technology Dhaka, Bangladesh
Raqeebir Rab

Department of Computer Science and Engineering Ahsanullah University of Science and Technology Dhaka, Bangladesh

Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning

keywords: Deep learning, ensemble learning, machine learning, text classification, imbalanced data, singular value decomposition, global vectors

Due to the recent escalation in the amount of text data available and used online, text classification has become a staple for data analysts when extracting relevant information. Yet, machine learning algorithms are susceptible to biases when implemented on any large-scale automated task, especially in text analysis. With the popularization of newer branches of study emerging from the field of machine learning -- such as ensemble and deep learning -- we must analyze the potential pitfalls in the common experimental setup centered around learning algorithms. Imbalance in text data is one such pitfall -- when data is not equally distributed across all categories in a dataset, it can influence and undermine the classification of underrepresented categories. In our research, we have proposed several techniques and unique approaches to tackle this obstacle. We prepared four datasets of varying degrees of imbalance to conduct our experimentation. We proved that feature extraction techniques singular value decomposition (SVD) and GloVe are the key to reducing the effect of imbalance in text classification, especially in ensemble and deep learning. Using the result of our research, we have also proposed a modified ensemble classifier that can classify imbalanced and balanced data alike.

reference: Vol. 41, 2022, No. 1, pp. 98–115

doi: 10.31577/cai_2022_1_98

Computing and Informatics

formerly Computers and Artificial Intelligence

Reducing the Effect of Imbalance in Text Classification Using SVD and GloVe with Ensemble and Deep Learning