基于改進DQN的異構無人機快速任務分配

王月海; 邱國帥; 邢娜; 趙欣怡; 韓曦

doi:10.13374/j.issn2095-9389.2025.01.20.001

摘要: 隨著無人機技術的快速發展，多無人機系統在執行復雜任務時展現出巨大潛力，高效的任務分配策略對提升多無人機系統的整體性能至關重要。然而，傳統方法在面對不同環境干擾時往往難以生成有效的分配策略，為此，本文考慮了環境不確定性，重點研究了改進的強化學習算法在無人機任務分配中的應用，使多無人機系統能夠迅速響應并實現資源的高效利用。首先，本文將無人機任務分配問題建模為馬爾可夫決策過程，通過神經網絡進行策略逼近用以任務分配中高效處理高維和復雜的狀態空間，避免傳統方法的維度災難，同時引入優先經驗重放機制，有效降低了在線計算的負擔。仿真結果表明，與其他強化學習方法相比，該算法具有較強的收斂性。在面對復雜環境時，其魯棒性更為顯著。此外，該算法在處理不同任務時僅需0.23秒即可完成一組適合的無人機分配，并能夠快速生成大規模無人機集群的任務分配方案。

Abstract: The rapid advancement of UAV technology has highlighted the tremendous potential of multi-UAV systems in handling complex tasks. Efficient task allocation strategies are crucial for enhancing the overall performance of these systems. While traditional methods work well in simple environments, they often struggle in more complex scenarios, where environmental disturbances and resource constraints hinder their effectiveness, leading to suboptimal task allocation outcomes. In contrast, reinforcement learning, as a powerful optimization technique, is particularly well-suited for addressing the challenges of multi-UAV task allocation. RL does not rely on pre-defined models or external knowledge, enabling the system to learn optimal strategies through continuous interactions with the environment. This flexibility allows the system to adapt to dynamic conditions and improve its decision-making over time. This paper proposes an innovative approach based on deep reinforcement learning to tackle the challenges faced in multi-UAV task allocation, while also considering the uncertainties typically encountered in real-world battlefield environments. These uncertainties include factors such as varying wind speeds, rainfall, and other external conditions that may impact UAV performance. The primary objective of this study is to ensure that multi-UAV systems can swiftly respond to multiple simultaneous tasks while optimizing resource utilization. Traditional task allocation methods, which are often heuristic or rule-based, are limited in their ability to handle complex environments or dynamic changes. They are typically rigid and struggle to adapt to unforeseen situations, leading to inefficiencies and delays in task allocation. To address these challenges, the paper models the task allocation problem as a Markov Decision Process. In this framework, the system can select the most appropriate task allocation strategy based on the current state of the environment, ensuring flexibility and timeliness in decision-making. To enhance the stability and robustness of the model, an evaluation network and a target network are designed, working in tandem to ensure reliable learning. By separating the state value and advantage value, the model effectively reduces the noise introduced by action selection, resulting in more accurate predictions and better decision-making. Additionally, this paper introduces a prioritized experience replay module, which ranks the importance of each experience sample based on its temporal difference (TD) error, thereby prioritizing the most useful experiences for learning. This approach enables the model to focus on more informative samples, accelerating the learning process and improving algorithm efficiency. By addressing the inefficiencies of traditional experience replay methods, which often reuse low-value samples, this technique ensures more efficient use of the available training time. Moreover, the paper employs neural network approximation techniques to alleviate the computational burden during online calculations, which is especially important in real-time applications with limited computational resources. Experimental results demonstrate that the proposed method makes significant progress in addressing the issue of resource waste in UAV task scheduling. For each task allocation request, the algorithm can complete UAV assignment in an average of just 0.23 seconds, greatly enhancing task allocation efficiency. Compared to traditional methods, the proposed algorithm not only outperforms in speed but also benefits from the prioritized experience replay module, which further improves convergence speed and stability. The scalability of the algorithm is also validated through simulations involving larger UAV fleets. The results show that the algorithm efficiently handles larger fleets without sacrificing performance. Further simulation tests confirm that the proposed method can optimize resource allocation, reduce system interference, and accelerate convergence. In conclusion, the method presented in this paper offers significant improvements in multi-UAV system task allocation, particularly in terms of enhancing task allocation efficiency and system adaptability.

基于改進DQN的異構無人機快速任務分配

Fast Task Allocation for Heterogeneous UAVs Based on Improved DQN