基于動態優化細節感知網絡的遙感圖像分割方法

梁書綺; 王雷; 孫燕青; 楊善良; 李彬

doi:10.13374/j.issn2095-9389.2025.04.09.002

摘要: 現有的遙感圖像分割模型, 例如基于卷積神經網絡(Convolutional Neural Network, CNN)和基于Transformer框架的模型, 取得了巨大成功, 但是還存在難以完整保留原始編碼器特征圖細節、動態捕捉全局上下文信息等缺點.因此, 基于CNN-Transformer混合框架, 提出了一種全新的基于動態優化細節感知網絡(Dynamic Optimized Detail-Aware Network, DODNet)的分割方法. 首先, 編碼器采用ResNext-50作為主干網絡, 設計一個多重減法感知模塊(Multi-Subtraction Perception Module, MSPM)來收集多尺度特征圖之間的空間細節差異, 有效減少冗余信息. 然后, 在解碼器設計一個動態信息融合模塊(Dynamic Information Fusion Block, DIFB), 它結合了全局雙層路由自注意力分支和局部注意力分支, 用于提高全局和局部信息的獲取能力. 最后, 提出一種新的通道空間注意力模塊——統一特征提取器(Unified Feature Extractor, UFE)以進一步獲取語義和上下文信息. 在Vaihingen和Potsdam兩個經典公開數據集, 通過對比和消融實驗的定量和可視化分析表明, 所提方法在F1分數、總體精度(Over Accuracy, OA)和平均交并比(Mean Intersection over Union, mIoU)評價指標中優于八種最先進的分割方法, 其中平均交并比分別達到了84.96%和87.64%, 驗證了所提方法在分割具有復雜背景、內類方差大和類間方差小問題的高分辨率遙感圖像的優越性能.

Abstract: Semantic segmentation technology has important application value in the field of remote sensing image processing and has been widely used in many fields. However, the complexity of high-resolution remote sensing images is mainly reflected in the following aspects: complex background interference, large intra-class differences and obvious inter-class similarities, resulting in blurred target boundaries. At the same time, the scale of target objects in the image varies greatly (such as buildings, vegetation, roads, etc., with large size differences), which further exacerbates the challenge of the segmentation task. The existing remote sensing image segmentation models, such as those based on Convolutional Neural Networks (CNN) and Transformer frameworks, have achieved great success. However, they still face challenges such as difficulty in fully preserving the detailed feature maps of the original encoder and dynamically capturing global contextual information. Therefore, based on the CNN-Transformer hybrid framework, a novel segmentation method called Dynamic Optimized Detail-Aware Network (DODNet) is proposed. The ResNext-50 is firstly adopted as the backbone network at encoder and a multi-subtraction perception module (MSPM) is designed to collect the spatial detail differences between multi-scale feature maps, which efficiently reduces the redundant information. Then, a dynamic information fusion block (DIFB) is designed at decoder, which combines a global bi-level routing self-attention branch and a local attention branch. The global bi-level routing self-attention branch first utilizes a learnable regional routing network to filter out low-association background areas, and then performs fine-grained attention calculation within the retained semantic key windows. This effectively addresses the dual challenges of background interference and computational efficiency in remote sensing image processing, achieving efficient global modeling. The local attention branch compensates for the local information that is difficult to capture by the global bi-level routing self-attention branch by utilizing multi-scale convolutions. Finally, a new channel-spatial attention module——unified feature extractor (UFE) is proposed for further acquiring the semantic and contextual information. The quantitative and visual analyses based on the comparison and ablation experiments on the Vaihingen and Potsdam datasets show that DODNet outperforms eight state-of-the-art segmentation methods in terms of F1 score, OA and mIoU. Especially, the mIoU reaches 84.96% and 87.64%, which verifies the strong ability of the proposed DODNet in dealing with the segmentation problem with complex background interference, large intra-class differences and obvious inter-class similarities.

基于動態優化細節感知網絡的遙感圖像分割方法

Remote sensing image segmentation method based on dynamic optimized detail-aware network