面向工業場景的無人機時空眾包資源分配

劉婭汐; 李旭龍; 霍佳皓; 皇甫偉

doi:10.13374/j.issn2095-9389.2024.06.01.001

摘要: 無人機時空眾包資源分配是工業物聯網能源管理中的重要任務之一. 盡管現有方法考慮了聯合反映時間敏感性和公平性的信息新鮮度指標，但忽略了無人機禁飛區和竊聽者對數據新鮮度的影響. 本文提出了一種基于深度強化學習的無人機時空眾包資源分配方法，在考慮無人機禁飛區約束和對竊聽者發送干擾信號以保障數據安全的情況下，最小化平均信息新鮮度和物聯網設備能耗，從而得到最優無人機軌跡、發射干擾信號功率和物聯網發射功率. 然而，無人機時空眾包中的資源分配復雜且存在挑戰，主要表現為決策變量類型多且與考慮服務質量要求的系統性能指標關系復雜. 本文將該問題建模為馬爾可夫決策過程并使用先進的深度強化學習算法求解該問題，即軟演員–評論家（SAC）算法. 在多無人機場景下驗證了所提出算法在解決無人機時空眾包資源分配任務中的有效性和正確性. 另外，SAC算法相較于其他兩種先進的深度強化學習算法，即深度確定性策略梯度算法和雙延遲深度確定性策略梯度算法，具有更快的收斂速度和更優的解. 最后，分析了最優無人機數目的選擇方案.

Abstract: Spatiotemporal crowdsourcing involves the use of various Internet of Things (IoT) devices distributed across industrial environments to collect and transmit spatiotemporal data related to industrial operations. Unmanned aerial vehicles (UAVs) play a crucial role in further collecting this data from IoT devices, especially in spatiotemporal crowdsourcing tasks. In the realm of industrial IoT energy management, allocating spatiotemporal crowdsourcing resources to UAVs poses substantial challenges. Traditional approaches to this problem have focused on optimizing the Age of Information (AoI) to ensure timely and equitable data updates. However, these methods often overlook critical operational constraints such as UAV no-fly zones and the risk of data interception by eavesdroppers. These issues can adversely affect the freshness and integrity of the information being gathered and transmitted. To address these shortcomings, this paper presents a novel deep reinforcement learning-based framework for UAV spatiotemporal crowdsourcing resource allocation. This approach aims to minimize the average AoI across the network while also reducing the energy consumption of IoT devices. It incorporates spatial constraints imposed by UAV no-fly zones and actively manages the transmission of jamming signals to mitigate the threat posed by eavesdroppers, thus ensuring data security. However, the complexity of allocating spatiotemporal crowdsourcing resources to UAVs is notable owing to numerous decision variables, which increase linearly with the duration of the service. Furthermore, the relationship between performance metrics and decision variables is intricate, requiring adherence to quality of service requirements. This problem is formalized as a Markov decision process (MDP), providing a structured approach to model the decision-making scenario faced by UAVs in a dynamic environment. To solve this MDP, we employ the soft actor critic (SAC) algorithm, an advanced deep reinforcement learning method known for its sample efficiency and stability. The SAC algorithm is adept at handling the continuous action spaces typical of UAV flight paths and power control problems, making it particularly well-suited for our application. We rigorously test our proposed methods in scenarios involving multiple UAVs, demonstrating the algorithm’s effectiveness in managing the spatiotemporal allocation of resources. Our results show that the SAC algorithm achieves faster convergence speed and better solutions than existing state-of-the-art methods, such as the twin delayed deep deterministic policy gradient (TD3) and the deep deterministic policy gradient (DDPG) algorithms. Furthermore, the paper delves into the strategic selection of the optimal number of UAVs to balance the trade-offs between coverage, energy consumption, and operational efficiency. By analytically and empirically examining the impact of the UAV fleet size on system performance, we provide insights into configuring UAV networks to achieve optimal outcomes in terms of AoI, energy management, and security. In conclusion, our research introduces a robust and intelligent framework for UAV resource allocation. The demonstrated efficacy of the SAC algorithm in this context paves the way for its future application in other domains where secure, efficient, and intelligent resource management is paramount.

面向工業場景的無人機時空眾包資源分配

UAV spatiotemporal crowdsourcing resource allocation based on deep reinforcement learning