Improving the Performance of Q-learning Using Simultanouse Q-values Updating

سال انتشار: 1393
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1,049

فایل این مقاله در 6 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IINC02_020

تاریخ نمایه سازی: 25 فروردین 1394

چکیده مقاله:

Q-learning is a one of the best model-free reinforcement learning algorithms. The goal is to find an estimate of the optimal action-value function called Q-value function. The Q-value function is defined as the expected sum of future rewards obtained by taking an action in the current state. The main drawback of Q-learning is that the learning process is expensive for the agent, specially, in the beginning steps. Because, every state-action pair should be visited frequently in order to converge to the optimal policy. In this paper, the concept of opposite action is used to improve the performance of the Q-learning algorithm, especially, in the beginning steps of the learning. Opposite actions suggest updating two Q-values, simultaneously. The agent will update Q-value for each action and corresponding opposite action and thus increasing the speed of learning. The novel Q-learning method based on the concept of opposite action is simulated for the famous test-bed grid world problem. The results show the ability of the proposed method to improve the learning process

نویسندگان

Maryam Pouyan

Electrical and computer engineering department Hormozgan university Bandarabbas, Iran

Amin Mousavi

Electrical and computer engineering department Hormozgan university Bandarabbas, Iran

Shahram Golzari

Electrical and computer engineering department Hormozgan university Bandarabbas, Iran

Ahmad Hatam

Electrical and computer engineering department Hormozgan university Bandarabbas, Iran