Ensemble Based Feature Extraction and Deep Learning Classification Model with Depth Vision

keywords: Human activities, improved LTXOR, BoW, Bi-LSTM, Bi-GRU classifier
It remains a challenging task to identify human activities from a video sequence or still image due to factors such as backdrop clutter, fractional occlusion, and changes in scale, point of view, appearance, and lighting. Different appliances, as well as video surveillance systems, human-computer interfaces, and robots used to study human behavior, require different activity classification systems. A four-stage framework for recognizing human activities is proposed in the paper. As part of the initial stages of pre-processing, video-to-frame conversion and adaptive histogram equalization (AHE) are performed. Additionally, watershed segmentation is performed and, from the segmented images, local texton XOR patterns (LTXOR), motion boundary scale-invariant feature transforms (MoBSIFT) and bag of visual words (BoW) based features are extracted. The Bidirectional gated recurrent unit (Bi-GRU) and the Bidirectional long short-term memory (Bi-LSTM) classifiers are used to detect human activity. In addition, the combined decisions of the Bi-GRU and Bi-LSTM classifiers are further fused, and their accuracy levels are determined. With this Dempster-Shafer theory (DST) technique, it is more likely that the results obtained from the analysis are accurate. Various metrics are used to assess the effectiveness of the deployed approach.
mathematics subject classification 2000: 46-T30
reference: Vol. 42, 2023, No. 4, pp. 965–992