基于動態優化細節感知網絡的遙感圖像分割方法

梁書綺; 王雷; 孫燕青; 楊善良; 李彬

doi:10.13374/j.issn2095-9389.2025.04.09.002

基于動態優化細節感知網絡的遙感圖像分割方法

Remote sensing image segmentation method based on dynamic optimized detail-aware network

摘要

摘要: 現有的遙感圖像分割模型，例如基于卷積神經網絡（Convolutional neural network，CNN）和基于Transformer框架的模型，取得了巨大成功，但是還存在難以完整保留原始編碼器特征圖細節、動態捕捉全局上下文信息等缺點. 因此，基于CNN–Transformer混合框架，提出了一種全新的基于動態優化細節感知網絡（Dynamic optimized detail-aware network，DODNet）的分割方法. 首先，在編碼器采用ResNext–50作為主干網絡，提出一種多重減法感知模塊（Multi-subtraction perception module，MSPM）來收集多尺度特征圖之間的空間細節差異，有效減少冗余信息. 然后，在解碼器設計一個動態信息融合模塊（Dynamic information fusion block，DIFB），它結合了全局雙層路由自注意力分支和局部注意力分支，用于提高全局和局部信息的獲取能力. 最后，提出一種新的通道空間注意力模塊—統一特征提取器（Unified feature extractor，UFE）以進一步獲取語義和上下文信息. 在Vaihingen、Potsdam和LoveDA三個經典公開數據集，通過對比和消融實驗的定量和可視化分析表明，所提方法在F1分數、總體精度（Over accuracy，OA）和平均交并比（Mean intersection over union，mIoU）評價指標中優于十種最先進的分割方法，其中平均交并比分別達到了84.96%、87.64%和52.43%，驗證了所提方法在分割具有復雜背景、內類方差大和類間方差小問題的高分辨率遙感圖像的優越性能.

Abstract: Semantic segmentation is an important technology for remote sensing image processing and has been widely applied in many fields. Although existing remote-sensing image segmentation models, such as convolutional neural network (CNN) and transformer-based segmentation methods, have achieved great success in this domain, there are still many disadvantages and challenges, such as the difficulty in fully preserving detailed feature maps by the original encoder and dynamically capturing global contextual information. To address these disadvantages and challenges, a novel remote-sensing image segmentation method called the dynamic optimized detail-aware network (DODNet) is proposed based on a CNN–transformer hybrid framework. First, a ResNext–50 network is employed as the backbone network at the encoder, and a multi-subtraction perception module (MSPM) is designed to collect spatial detail differences between multiscale feature maps to effectively reduce redundant information. This module integrates multidirectional depth-wise separable convolutions with parallel dilated convolutions to enhance the feature representation ability. By performing pixel-wise subtraction after upsampling and spatial alignment, different feature maps are generated to capture the significant variation regions, effectively preserving the boundary and other detailed information in the remote-sensing images, while improving the model's perception of small objects. Then, a dynamic information fusion block (DIFB), which combines global bi-level routing self-attention and local attention branches to improve the ability to obtain global and local information, is designed for the decoder. The global bi-level routing self-attention branch utilizes a learnable regional routing network to filter out low-association background areas and then performs a fine-grained attention calculation within the retained semantic key windows. This scheme effectively addresses the dual challenges of background interference and computational efficiency in remote-sensing image segmentation. The local attention branch compensates for the local information that is difficult to capture by the global bi-level routing self-attention branch by utilizing multiscale convolutions. Finally, a new channel-spatial attention module, the unified feature extractor (UFE), is proposed to obtain semantic and contextual information by serially fusing channel and spatial attention mechanisms. In the channel attention stage, combined with a one-dimensional depth-separable convolution to extract channel features, dual-path average pooling in width and height directions is used to replace traditional global pooling. Subsequently, a multiscale convolution fusion strategy is introduced in the spatial attention stage, and spatial attention weights are generated through instance normalization; thus, this module pays more attention to local features and foreground objects inside an image. To verify the effectiveness and accuracy of the proposed method, experiments and ablation tests were carefully designed and implemented on three open and typical datasets: Vaihingen, Potsdam, and LoveDA. By comparing the experimental results, quantitative and visual analyses showed that DODNet outperforms ten state-of-the-art segmentation methods in terms of the F1 score, over accuracy (OA), and mean intersection over union (mIoU). In particular, the mIoU values reached 84.96, 87.64, and 52.43%, respectively, verifying the strong ability of the proposed DODNet to deal with the segmentation problem with complex background interference, large intra-class differences, and obvious inter-class similarities.

HTML全文

參考文獻(43)

施引文獻

資源附件(0)