Actor Double Critic Architecture for Dialogue System

سال انتشار: 1402
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 256

فایل این مقاله در 10 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JECEI-11-2_011

تاریخ نمایه سازی: 4 تیر 1402

چکیده مقاله:

kground and Objectives: Most of the recent dialogue policy learning ‎methods are based on reinforcement learning (RL). However, the basic RL ‎algorithms like deep Q-network, have drawbacks in environments with ‎large state and action spaces such as dialogue systems. Most of the ‎policy-based methods are slow, cause of the estimating of the action value ‎using the computation of the sum of the discounted rewards for each ‎action. In value-based RL methods, function approximation errors lead to ‎overestimation in value estimation and finally suboptimal policies. There ‎are works that try to resolve the mentioned problems using combining RL ‎methods, but most of them were applied in the game environments, or ‎they just focused on combining DQN variants. This paper for the first time ‎presents a new method that combines actor-critic and double DQN named ‎Double Actor-Critic (DAC), in the dialogue system, which significantly ‎improves the stability, speed, and performance of dialogue policy learning. ‎Methods: In the actor critic to overcome the slow learning of normal DQN, ‎the critic unit approximates the value function and evaluates the quality ‎of the policy used by the actor, which means that the actor can learn the ‎policy faster. Moreover, to overcome the overestimation issue of DQN, ‎double DQN is employed. Finally, to have a smoother update, a heuristic ‎loss is introduced that chooses the minimum loss of actor-critic and ‎double DQN. ‎Results: Experiments in a movie ticket booking task show that the ‎proposed method has more stable learning without drop after ‎overestimation and can reach the threshold of learning in fewer episodes ‎of learning. ‎Conclusion: Unlike previous works that mostly focused on just proposing ‎a combination of DQN variants, this study combines DQN variants with ‎actor-critic to benefit from both policy-based and value-based RL methods ‎and overcome two main issues of both of them, slow learning and ‎overestimation. Experimental results show that the proposed method can ‎make a more accurate conversation with a user as a dialogue policy ‎learner.‎

نویسندگان

Y. Saffari

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

J. Salimi Sartakhti

Department of Electrical and Computer Engineering, University of Kashan, Kashan, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Z. C. Lipton, J. Gao, L. Li, X. Li, F. ...
  • T. H. Wen, D. Vandyke, N. Mrkšić, M. Gašić, L. ...
  • H. Cuay ́ahuitl, S. Renals, O. Lemon, H. Shimodaira, "Hierarchical ...
  • X. Li, Y. N. Chen, L. Li, J. Gao, A. ...
  • H. Sun, C. Zhao, S. Liu, H. Jiang, "A pipeline ...
  • R. Fellows, H. Ihshaish, S. Battle, C. Haines, P. Mayhew, ...
  • M. I. Bahria, Z. Yan, "Supervised machine learning approaches: A ...
  • R. Howard, Dynamic Programming and Markov Processes, The MIT Press, ...
  • S. Young, M. Gasiˇ c, B. Thomson, J. D. Williams, ...
  • J. D. Williams, S. Young, "Partially observable markov decision processes ...
  • J. Williams, A. Raux, D. Ramachandran, A. Black, "The dialog ...
  • P. Swazinna, S. Udluft, D. Hein, T. Runkler, "Comparing model-free ...
  • V. Mnih , K. Kavukcuoglu, D. Silver, A. A. Rusu ...
  • S. Thrun, A. Schwartz, "Issues in using function approximation for ...
  • H. van Hasselt, A. Guez, D. Silver, "Deep reinforcement learning ...
  • R. Chen, J. H. Goldberg, "Actor-critic reinforcement learning in the ...
  • V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou, ...
  • D. Silver, A. Huang, C. J Maddison, A. Guez, L. ...
  • B. Peng, X. Li, J. Gao, J. Liu, Y.-N. Chen, ...
  • X. Li, Z. C. Lipton, B. Dhingra, L. Li, J. ...
  • C. J. Watkins and P. Dayan, "Q-learning," Mach. Learn., (۸): ...
  • V. Mnih, A. Puigdomènech Badia, "Asynchronous methods for deep reinforcement ...
  • J. Gao, M. Galley, L. Li, "Neural approaches to conversational ...
  • Y. Wu, E. Mansimov, S. Liao, R. Grosse, "Scalable trust-region ...
  • Y. C. Wu, B. H. Tseng, M. Gas, "Actor-double-critic: incorporating ...
  • J. Peters, S. Vijayakumar, S. Schaal, "Natural Actor-Critic," ECML: ۲۸۰–۲۹۱, ...
  • Z. Wang, V. Bapst, N. Hees, V. Mnih, R. Munos, ...
  • M. Sabry, K. M. A. Amr , "On the reduction ...
  • X. Wang, A. Vinel, "Cross learning in deep q-networks," arxive ...
  • Y. Chen, L. Schomaker, M. A. Wiering, "An Investigation Into ...
  • S. Fujimoto , H. van Hoof, D. Meger , "Addressing ...
  • Y. A. Wang, Y. N. Chen, "Dialogue environments are different ...
  • D. Vath, N. T. Vu, "To combine or not to ...
  • M. Henderson, B. Thomson, J. D. William, "The second dialog ...
  • M. Fatemi, L. E. Asri, H. Schulz, J. He, K. ...
  • H. R. Chinaei, B. Chaib-draa, L. Lamontagne, "Learning observation models ...
  • I. Grondman, L. Busoniu, G. A. D. Lopes, R. Babuska, ...
  • D. P. Kingma, J. Ba, "Adam: A method for stochastic ...
  • Z. Wang, T. Schaul, M. Hessel, H. V. Hasselt, M. ...
  • نمایش کامل مراجع