کنترل ردیاب وضعیت بهینه تحمل‌پذیر عیب کوادروتور در حضور قیود حالت و ورودی با استفاده از یادگیری تقویتی ایمن

نوع مقاله : گرایش دینامیک، ارتعاشات و کنترل

نویسندگان

1 دانشجوی دکتری، دانشکده مهندسی برق، دانشگاه علم و صنعت ایران، تهران، ایران

2 نویسنده مسئول: دانشیار، دانشکده مهندسی برق، دانشگاه علم و صنعت ایران، تهران، ایران

چکیده

در این مقاله، به ارائه روشی جهت طراحی سیستم کنترل وضعیت ردیاب بهینه برای پرنده کوادروتور که در معرض عیوب اجزا و عملگر قرار دارد پرداخته‌شده است. روش کنترل تحمل‌پذیر عیب یکپارچه پیشنهادی مبتنی بر یادگیری تقویتی ایمن ارائه‌شده است و قادر است بدون نیاز به شناخت قبلی از دینامیک پرنده، قیود ورودی و حالات را تضمین نماید. به این منظور، روش بهینه پیشنهادی با ساختار شبکه عصبی دوگانه شامل شبکه‌های عصبی شناساگر-نقاد ارائه‌شده است. در قانون به‌روزرسانی وزن‌های شبکه شناساگر علاوه بر متغیر در نظر گرفتن ضریب فراموشی از روش پاسخ تجربه استفاده‌شده که باعث افزایش سرعت همگرایی و مقاومت نسبت به نویز اندازه‌گیری و کاهش خطای تخمین می‌شوند. در این روش، حل مسئله کنترل ردیاب وضعیت بهینه تحمل‌پذیر عیب در حالت مقید با حل مسئله‌ پایدارسازی بهینه نامقید برای یک سیستم افزوده معادل می‌شود که در آن قیود ورودی کنترلی و حالات به ترتیب با انتخاب تابع هزینه مناسب بر سیگنال ورودی و توابع کنترل مانع مناسب بر حالات، تضمین داده می‌شوند. همچنین آشکارسازی وقوع عیب بدون نیاز به هیچ‌گونه بانکی از مدل یا فیلتر و صرفاً با مقایسه مقدار باقی‌مانده معادله همیلتون-ژاکوبی-بلمن با یک آستانه از پیش تعیین‌شده انجام می‌پذیرد. پایداری فراگیر یکنواخت وزن‌های هر دو شبکه و درنتیجه همگرایی قانون کنترل به پاسخ بهینه با استفاده از قضیه لیاپانوف اثبات و با استفاده از نتایج شبیه‌سازی صحت عملکرد آن نشان داده‌شده است.

تازه های تحقیق

  • عدم نیاز به شناخت دینامیک پرنده
  • بهبود سرعت شبکه‌های شناساگر و نقاد
  • تضمین قیود ورودی و حالت
  • تضمین پایداری سیستم در تمامی زمان‌ها
  • آشکارسازی وقوع عیب مبتنی بر خطای HJB بدون نیاز به مدل

کلیدواژه‌ها


عنوان مقاله [English]

Fault-Tolerant Optimal Attitude Tracking Control of Quadrotor Subject to State and Input Constraints Using Safe Reinforcement Learning

نویسندگان [English]

  • Sajad Roshanravan 1
  • Saeed Shamaghdari 2
1 Ph.D. Student, Faculty of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
2 Corresponding author: Associate Professor, Faculty of Electrical Engineering, Iran University of Science and Technology, Tehran, Iran
چکیده [English]

In this article, a method for designing a fault-tolerant optimal attitude tracking control (FTOATC) for a quadrotor UAV subject to component and actuator faults is presented. The proposed fault-tolerant method is based on safe reinforcement learning (SRL) and is capable of ensuring input and state constraints without the need for prior knowledge of the quadrotor dynamics. To this end, the proposed optimal method is presented with a dual neural network (NN) structure consisting of identifier-critic neural networks. In the identifier NN update law, in addition to considering the variable forgetting factor dependent on measurement noise, the experience response method is used, which increases convergence speed and robustness to measurement noise and reduces estimation error. In this method, solving the constrained FTOATC problem is equivalent to solving an unconstrained optimal stabilization problem for an augmented system, where control input constraints and states are guaranteed by selecting suitable cost functions on the input signal and appropriate control barrier functions (CBF)on the states, respectively. Furthermore, fault detection is performed without the need for any model or filter bank, simply by comparing the residual value of the Hamilton-Jacobi-Bellman (HJB) equation with a predetermined threshold. The Uniformly Ultimately Boundedness (UUB) of identifier and critic NN weight errors and, as a result, the convergence of the control input to the neighborhood of the optimal solution are all proved by Lyapunov theory and the performance of the method is validated through simulation results.

کلیدواژه‌ها [English]

  • Quadrotor attitude control
  • Component and actuator faults
  • Fault-tolerant optimal control
  • Fault detection
  • Safe reinforcement learning

Smiley face

[3] Zhao W, Liu H, Lewis FL. Data-driven fault-tolerant control for attitude synchronization of nonlinear quadrotors. IEEE Transactions on Automatic Control. 2021;66(11):5584-91. DOI :10.1109/TAC.2021.3053194.
[4] Amin AA, Hasan KM. A review of fault tolerant control systems: advancements and applications. Measurement. 2019;143:58-68. DOI :10.1016/j.measurement.2019.04.083.
[5] Roshanravan S, Sobhani Gendeshmin B, Shamaghdari S. Design of an actuator fault-tolerant controller for an air vehicle with nonlinear dynamics. Proceedings of the Institution of Mechanical Engineers, Part G: Journal of Aerospace Engineering. 2019;233(10):3534-46. DOI :10.1177/0954410018801254.
[6] Jiang J, Yu X. Fault-tolerant control systems: A comparative study between active and passive approaches. Annual Reviews in control. 2012;36(1):60-72. DOI :10.1016/j.arcontrol.2012.03.005.
[7] Rudin K, Ducard GJ, Siegwart RY. Active fault-tolerant control with imperfect fault detection information: Applications to UAVs. IEEE Transactions on Aerospace and Electronic Systems. 2019;56(4):2792-805.
[8] Lan J, Patton RJ. A new strategy for integration of fault estimation within fault-tolerant control. Automatica. 2016;69:48-59.
[9] Roshanravan S, Shamaghdari S. Simultaneous fault detection and isolation and fault-tolerant control using supervisory control technique: asynchronous switching approach. Proceedings of the Institution of Mechanical Engineers, Part I: Journal of Systems and Control Engineering. 2020;234(8):900-11. DOI :10.1177/0959651819893891.
[10] Ruan Z, Yang Q, Ge SS, Sun Y. Performance-guaranteed fault-tolerant control for uncertain nonlinear systems via learning-based switching scheme. IEEE Transactions on Neural Networks and Learning Systems. 2020;32(9):4138-50. DOI :10.1109/TNNLS.2020.3016954.
[11] Li L, Luo H, Ding SX, Yang Y, Peng K. Performance-based fault detection and fault-tolerant control for automatic control systems. Automatica. 2019;99:308-16. DOI :10.1016/j.automatica.2018.10.047.
[12] Cheng W, Zhang K, Jiang B. Hierarchical Structure-Based Fixed-Time Optimal Fault-Tolerant Time-Varying Output Formation Control for Heterogeneous Multiagent Systems. IEEE Transactions on Systems, Man, and Cybernetics: Systems. 2023;53(8):4856-66.. DOI :10.1109/TSMC.2023.3257426.
[13] Bardi M, Dolcetta IC. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman equations: Springer; 1997. DOI :10.1007/978-0-8176-4755-1.
[14] Lewis FL, Vrabie D. Reinforcement learning and adaptive dynamic programming for feedback control. IEEE circuits and systems magazine. 2009;9(3):32-50. DOI :10.1109/MCAS.2009.933854.
[15] Huang J, Zeng W, Xiong H, Noack BR, Hu G, Liu S, Xu Y, Cao H. Symmetry-Informed Reinforcement Learning and its Application to Low-Level Attitude Control of Quadrotors. IEEE Transactions on Artificial Intelligence. 2023;5(3):1147-61. DOI :10.1109/TAI.2023.3249683.
[16] Bernini N, Bessa M, Delmas R, Gold A, Goubault E. Reinforcement learning with formal performance metrics for quadcopter attitude control under non-nominal contexts. Engineering Applications of Artificial Intelligence. 2024; 127: 107090. DOI :10.1016/j.engappai.2023.107090.
[17] Zhu Y, Lian S, Zhong W, Meng, W. Reinforcement learning method for quadrotor attitude control based on expert information. 8th International Conference on Automation, Control and Robotics Engineering (CACRE); 2023: IEEE. DOI :10.1109/CACRE58689.2023.10208497.
[18] Yang Y, Vamvoudakis KG, Modares H, Yin Y, Wunsch DC. Safe intermittent reinforcement learning with static and dynamic event generators. IEEE Transactions on Neural Networks and Learning Systems. 2020;31(12):5441-55. DOI :10.1109/TNNLS.2020.2967871.
[19] Marvi Z, Kiumarsi B. Safe reinforcement learning: A control barrier function optimization approach. International Journal of Robust and Nonlinear Control. 2021;31(6):1923-40. DOI :10.1002/rnc.5132.
[20] Al-Tamimi A, Lewis FL, Abu-Khalaf M. Discrete-time nonlinear HJB solution using approximate dynamic programming: Convergence proof. IEEE Transactions on Systems, Man, and Cybernetics, Part B. 2008;38(4):943-9. DOI :10.1109/TSMCB.2008.926614.
[21] Lv Y, Na J, Yang Q, Wu X, Guo Y. Online adaptive optimal control for continuous-time nonlinear systems with completely unknown dynamics. International Journal of Control. 2016;89(1):99-112. DOI :10.1080/00207179.2015.1060362.
[22] Lv Y, Na J, Zhao X, Huang Y, Ren X. Multi-H∞ controls for unknown input-interference nonlinear system with reinforcement learning. IEEE Transactions on Neural Networks and Learning Systems. 2021. DOI :10.1109/TNNLS.2021.3130092.
[23] Mishra A, Ghosh S. Simultaneous identification and optimal tracking control of unknown continuous-time systems with actuator constraints. International Journal of Control. 2022;95(8):2005-23. DOI :10.1080/00207179.2021.1890824.
[24] Roshanravan S, Shamaghdari S. Adaptive fault-tolerant tracking control for affine nonlinear systems with unknown dynamics via reinforcement learning. IEEE Transactions on Automation Science and Engineering. 2022;21(1):569-80. DOI :10.1109/TASE.2022.3223702.
[25] Dierks T, Jagannathan S, editors. Optimal control of affine nonlinear continuous-time systems. Proceedings of the 2010 American control conference; 2010: IEEE. DOI :10.1109/ACC.2010.5531586.
[26] Liu D, Yang X, Wang D, Wei Q. Reinforcement-learning-based robust controller design for continuous-time uncertain nonlinear systems subject to input constraints. IEEE transactions on cybernetics. 2015;45(7):1372-85. DOI :10.1109/TCYB.2015.2417170.
[27] Yang H, Jiang B, Staroswiecki M. Supervisory fault tolerant control for a class of uncertain nonlinear systems. Automatica. 2009;45(10):2319-24. DOI :10.1016/j.automatica.2009.06.019.
[28] Ma H-J, Xu L-X, Yang G-H. Multiple environment integral reinforcement learning-based fault-tolerant control for affine nonlinear systems. IEEE Transactions on Cybernetics. 2019;51(4):1913-28. DOI :10.1109/TCYB.2018.2889679.
[29] Choi YC, Ahn HS. Nonlinear control of quadrotor for point tracking: Actual implementation and experimental tests. IEEE/ASME Transactions on Mechatronics. 2014;20(3):1179-92. DOI :10.1109/TMECH.2014.2329945.
[30] Edwards C, Lombaerts T, Smaili H. Fault tolerant flight control. Lecture notes in control and information sciences. 2010;399:1-560. DOI :10.1007/978-3-642-11690-2.
[31] Modares H, Lewis FL, Naghibi-Sistani M-B. Adaptive optimal control of unknown constrained-input systems using policy iteration and neural networks. IEEE Transactions on neural networks and learning systems. 2013;24(10):1513-25. DOI :10.1109/TNNLS.2013.2276571.
[32] Na J, Mahyuddin MN, Herrmann G, Ren X, Barber P. Robust adaptive finite‐time parameter estimation and control for robotic systems. International Journal of Robust and Nonlinear Control. 2015;25(16):3045-71. DOI :10.1002/rnc.3247.
[33] Modares H, Lewis FL. Optimal tracking control of nonlinear partially-unknown constrained-input systems using integral reinforcement learning. Automatica. 2014;50(7):1780-92. DOI :10.1016/j.automatica.2014.05.011.
[34] Abu-Khalaf M, Lewis FL. Nearly optimal control laws for nonlinear systems with saturating actuators using a neural network HJB approach. Automatica. 2005;41(5):779-91. DOI :10.1016/j.automatica.2004.11.034.
[35] Modares H, Lewis FL, Naghibi-Sistani M-B. Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica. 2014;50(1):193-202. DOI :10.1016/j.automatica.2013.09.043.
[36] Stone M. The generalized Weierstrass approximation theorem. Mathematics Magazine. 1948;21(5): 237-254.
[37] Rudin W. Principles of mathematical analysis1953.
[38] Ding SX. Model-based fault diagnosis techniques: design schemes, algorithms, and tools: Springer Science & Business Media; 2008.
 
دوره 20، شماره 1 - شماره پیاپی 75
شماره پیاپی 75، فصلنامه بهار
فروردین 1403
صفحه 141-161
  • تاریخ دریافت: 15 مهر 1402
  • تاریخ بازنگری: 07 آبان 1402
  • تاریخ پذیرش: 11 آذر 1402
  • تاریخ انتشار: 27 فروردین 1403