INCSA-UNET: Spatial Attention Inception UNET for Aerial Images Segmentation

keywords: Segmentation, deep learning, CNN, INCSA-UNET, attention
Building segmentation from aerial images is essential in applications such as facilitating urban planning and estimating the population. Fully convolutional networks (FCNs) and especially UNET have achieved promising results in segmentation problems, after deep learning methods have significantly advanced the performance of many computer vision problems. However, in Convolutional Neural Networks (CNNs) with the standard convolution operations, there are problems such as the overfitting and precise extraction of the boundaries of the objects with different sizes and shapes. In this study, we have used Inception blocks with UNET to enhance feature extraction by implementing two-level Inception approach covering the entire encoding stage. In the proposed architecture, structured form of dropout (DropBlock) is used to prevent overfitting, and spatial/channel attention modules are applied to enhance important features by focusing key areas. We evaluate the proposed INCSA-UNET architecture on publicly available Massachusetts dataset and apply two fold cross-validation experiments for better analyzes. The experimental results show that the proposed architecture does not significantly increase the number of parameters of UNET and has a significant improvement in terms of F1 and Kappa quantitative measures.
reference: Vol. 40, 2021, No. 6, pp. 1244–1262