Multi-label Bird Species Classification Using Sequential Aggregation Strategy from Audio Recordings

keywords: Multi-label, sequential, augmentation, recurrent neural network, convolutional neural network, transfer learning
Birds are excellent bioindicators, playing a vital role in maintaining the delicate balance of ecosystems. Identifying species from bird vocalization is arduous but has high research gain. The paper focuses on the detection of multiple bird vocalizations from recordings. The proposed work uses a deep convolutional neural network (DCNN) and a recurrent neural network (RNN) architecture to learn the bird's vocalization from mel-spectrogram and mel-frequency cepstral coefficient (MFCC), respectively. We adopted a sequential aggregation strategy to make a decision on an audio file. We normalized the aggregated sigmoid probabilities and considered the nodes with the highest scores to be the target species. We evaluated the proposed methods on the Xeno-canto bird sound database, which comprises ten species. We compared the performance of our approach to that of transfer learning and Vanilla-DNN methods. Notably, the proposed DCNN and VGG-16 models achieved average F1 metrics of 0.75 and 0.65, respectively, outperforming the acoustic cue-based Vanilla-DNN approach.
reference: Vol. 42, 2023, No. 5, pp. 1255–1280