PID Controller Tuning with Deep Reinforcement Learning Policy Gradient Methods

Kasra Sinaei; Mohammad Reza Ha'iri Yazdi

PID Controller Tuning with Deep Reinforcement Learning Policy Gradient Methods

محل انتشار: بیست و نهمین همایش سالانه بین المللی انجمن مهندسان مکانیک ایران و هشتمین همایش صنعت نیروگاه های حرارتی

سال انتشار: 1400

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 443

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/1238512

شناسه ملی سند علمی:

ISME29_304

تاریخ نمایه سازی: 13 تیر 1400

چکیده مقاله:

In this paper challenge of tuning a PID controller for a single input single output (SISO) system has been overcome with couple of reinforcement learning agents which can automatically find the optimum values for controller parameters (kp, ki, kd). First, a self-balancing robot with two coaxial wheels was simulated using the PyBullet physics library. Motors, IMU and Inertial Measurement Unit (IMU) were added via PyBullet features. Next, the robot’s Environment has been defined using the OpenAI GYM library. Both state space and action space of RL agents are continuous and ANN was used as function approximator in RL agents. For better computation speed and faster training, agents were implemented with Microsoft COAX, JAX, and Haiku since they have privileges of using GPU acceleration. Neural Network backpropagation is a computationally expensive operation and in case the forward pass of ANN gets more complicated than hardware capabilities it might cause problems for real-time simulation (step-simulation is possible for all cases). During the training agent’s properties recorded and plotted. Finally, we drew comparison between agents themselves and a manually tuned controller with the classic method. Even with the PID controller (not tuned and randomly adjusted), the system itself is still naturally unstable and the stability criteria (controller stability, pitch angle of torso, the center of mass linear or angular speed and etc.) should be considered in reward function for best possible results.

کلیدواژه ها:

Reinforcement Learning ، Proximal Policy Optimization ، Advantageous Actor-Critic ، PID Controller ، Self-balancing robot ، Automatic tuning

نویسندگان

Kasra Sinaei

Center of Advanced Systems and Technology, University of Tehran, Tehran;

Mohammad Reza Ha'iri Yazdi

School of Mechanical Engineering, University of Tehran, Tehran