Improving the Performance of Q-learning Using Simultanouse Q-values Updating
سال انتشار: 1393
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1,123
فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
IINC02_020
تاریخ نمایه سازی: 25 فروردین 1394
چکیده مقاله:
Q-learning is a one of the best model-free reinforcement learning algorithms. The goal is to find an estimate of the optimal action-value function called Q-value function. The Q-value function is defined as the expected sum of future rewards obtained by taking an action in the current state. The main drawback of Q-learning is that the learning process is expensive for the agent, specially, in the beginning steps. Because, every state-action pair should be visited frequently in order to converge to the optimal policy. In this paper, the concept of opposite action is used to improve the performance of the Q-learning algorithm, especially, in the beginning steps of the learning. Opposite actions suggest updating two Q-values, simultaneously. The agent will update Q-value for each action and corresponding opposite action and thus increasing the speed of learning. The novel Q-learning method based on the concept of opposite action is simulated for the famous test-bed grid world problem. The results show the ability of the proposed method to improve the learning process
کلیدواژه ها:
نویسندگان
Maryam Pouyan
Electrical and computer engineering department Hormozgan university Bandarabbas, Iran
Amin Mousavi
Electrical and computer engineering department Hormozgan university Bandarabbas, Iran
Shahram Golzari
Electrical and computer engineering department Hormozgan university Bandarabbas, Iran
Ahmad Hatam
Electrical and computer engineering department Hormozgan university Bandarabbas, Iran