異構三機器人協同搬運的高柔順性研究

張樹忠; 齊春雨; 張弓; 蘇佳鴻; 邱偉前; 阮玉鎮

doi:10.13374/j.issn2095-9389.2024.12.16.002

摘要: 針對異構三機器人系統的協同搬運柔順性問題，提出基于近端策略優化（Proximal policy optimization）的強化學習控制方法. 在CoppeliaSim機器人仿真器中建立了異構三機器人協同搬運的仿真環境，分別開展了力控制與強化學習控制的對比仿真. 仿真結果表明：強化學習控制下，物體質心的軌跡誤差在Z方向上最優，僅為力控制的4.7%，機器人2的末端速度變化和其典型關節的角速度變化更為平滑. 采用sim2real的方法，將兩種控制方法部署到三機器人協同搬運實驗中. 實驗結果表明：強化學習控制下，Z方向的物體軌跡跟蹤誤差同樣最優，僅為力控制的5.4%. 機器人2在X方向上的速度變化僅為力控制的20.7%，其典型關節展現出更好的柔順性，角速度變化僅為力控制下的35.2%. 仿真與實驗結果表明：強化學習的控制效果更優，也具備從仿真到現實遷移的可行性.

Abstract: This paper proposes a reinforcement learning (RL)-based control framework utilizing the proximal policy optimization (PPO) algorithm to address compliance issues in cooperative transportation tasks for heterogeneous tri-robot systems. The focus is on enhancing motion coordination and force adaptability in three heterogeneous robots during collaborative object transportation. A high-fidelity simulation environment was first constructed in the CoppeliaSim robotic simulator, where the tri-robot with distinct kinematic and dynamic configurations was programmed to collaboratively manipulate a shared object. Comparative simulations were conducted between traditional force control methods and the proposed RL-based approach to evaluate the robot performance in trajectory tracking accuracy, motion smoothness, and system compliance. Under the RL control framework, the PPO algorithm was trained to optimize the robots’ joint actions by maximizing a reward function designed to penalize trajectory deviations, excessive contact forces, and abrupt velocity changes. The simulation results demonstrate that the RL-controlled system achieves remarkable improvements in vertical (Z-axis) trajectory tracking precision. Specifically, the trajectory error of the object’s center of mass in the Z-direction was reduced to 4.7% of that observed under conventional force control. Furthermore, Robot 2—selected as a representative agent owing to its central role in the task—exhibited significantly smoother motion characteristics under RL control. Its end-effector velocity variations in the horizontal (X–Y) plane were attenuated by 82% compared to force control, while angular velocity fluctuations in its primary rotational joint were reduced to 35% of the baseline values, indicating enhanced mechanical compliance and reduced oscillatory behavior. To validate the real-world applicability of the learned policies, a sim2real transfer methodology was implemented. The control strategies were deployed on a physical tri-robot platform comprising one six-degrees-of-freedom (DOF) industrial manipulator and two customized four-DOF collaborative robots, tasked with synchronously transporting a deformable payload. The experimental results agreed with simulation predictions: the RL-based controller maintained superior Z-direction trajectory tracking performance, limiting errors to 5.4% of those under force control. Robot 2’s motion compliance showed further improvement in physical experiments, with its X-direction velocity variations reduced to 20.7% of the force control benchmark. Critical joint-level analyses revealed that the angular velocity variations of Robot 2’s third joint—a pivotal component for vertical motion compensation—were suppressed to 35.2% of the force control values, confirming the RL controller’s ability to mitigate mechanical vibrations and adapt to dynamic payload interactions. The study also investigates the robustness of the RL framework to real-world uncertainties, including sensor noise, communication latency, and payload deformation. Despite these challenges, the RL controller maintained stable performance, achieving a 92% reduction in peak contact forces compared to force control during sudden payload shifts. Statistical analyses of motion data further indicated that the RL-based system reduced the standard deviation of inter-robot coordination errors by 76% and 68% in simulation and physical experiments, respectively, underscoring its consistency across domains. Both simulation and experimental findings conclusively demonstrate that the PPO-based RL framework not only surpassed traditional force control in precision and compliance but also successfully bridged the sim2real gap. The framework’s ability to learn adaptive policies in simulation and transfer them to physical robots with minimal fine-tuning highlights its potential for deployment in industrial applications requiring heterogeneous multi-robot collaboration. This work advances the field of compliant robotic control by providing a scalable, data-driven solution that harmonizes trajectory accuracy, motion smoothness, and real-world adaptability in complex cooperative tasks.

異構三機器人協同搬運的高柔順性研究

High flexibility of heterogeneous tri-robot collaborative handling