Learned Spatio-Temporal Texture Descriptors for RGB-D Human Action Recognition

Zhengyuan Zhai

Beijing University of Posts and Telecommunications Beijing Key Laboratory of Work Safety Intelligent Monitoring Xitucheng Road 10 100 876 Beijing, China
Chunxiao Fan

Beijing University of Posts and Telecommunications Beijing Key Laboratory of Work Safety Intelligent Monitoring Xitucheng Road 10 100 876 Beijing, China
Yue Ming

Beijing University of Posts and Telecommunications Beijing Key Laboratory of Work Safety Intelligent Monitoring Xitucheng Road 10 100 876 Beijing, China

Learned Spatio-Temporal Texture Descriptors for RGB-D Human Action Recognition

keywords: 3D pixel differences vectors, compact binary face descriptor, feature fusion, human action recognition, RGB-depth videos

Due to the recent arrival of Kinect, action recognition with depth images has attracted researchers' wide attentions and various descriptors have been proposed, where Local Binary Patterns (LBP) texture descriptors possess the properties of appearance invariance. However, the LBP and its variants are most artificially-designed, demanding engineers' strong prior knowledge and not discriminative enough for recognition tasks. To this end, this paper develops compact spatio-temporal texture descriptors, i.e. 3D-compact LBP(3D-CLBP) and local depth patterns (3D-CLDP), for color and depth videos in the light of compact binary face descriptor learning in face recognition. Extensive experiments performed on three standard datasets, 3D Online Action, MSR Action Pairs and MSR Daily Activity 3D, demonstrate that our method is superior to most comparative methods in respects of performance and can capture spatial-temporal texture cues in videos.

mathematics subject classification 2000: 68Txx

reference: Vol. 37, 2018, No. 6, pp. 1339–1362

doi: 10.4149/cai_2018_6_1339

Computing and Informatics

formerly Computers and Artificial Intelligence

Learned Spatio-Temporal Texture Descriptors for RGB-D Human Action Recognition