Attention Mechanisms in Transformers: A General Survey

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 22

فایل این مقاله در 11 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_JADM-13-3_008

تاریخ نمایه سازی: 12 شهریور 1404

چکیده مقاله:

The attention mechanisms have significantly advanced the field of machine learning and deep learning across various domains, including natural language processing, computer vision, and multimodal systems. This paper presents a comprehensive survey of attention mechanisms in Transformer architectures, emphasizing their evolution, design variants, and domain-specific applications in NLP, computer vision, and multimodal learning. We categorize attention types by their goals like efficiency, scalability, and interpretability, and provide a comparative analysis of their strengths, limitations, and suitable use cases. This survey also addresses the lack of visual intuitions, offering a clearer taxonomy and discussion of hybrid approaches, such as sparse-hierarchical combinations. In addition to foundational mechanisms, we highlight hybrid approaches, theoretical underpinnings, and practical trade-offs. The paper identifies current challenges in computation, robustness, and transparency, offering a structured classification and proposing future directions. By comparing state-of-the-art techniques, this survey aims to guide researchers in selecting and designing attention mechanisms best suited for specific AI applications, ultimately fostering the development of more efficient, interpretable, and adaptable Transformer-based models.

کلیدواژه ها:

نویسندگان

Rasoul Hosseinzadeh

Department of Computer Engineering, Science and Research SR.C., Islamic Azad University, Tehran, Iran.

Mahdi Sadeghzadeh

Department of Computer Engineering, Science and Research SR.C., Islamic Azad University, Tehran, Iran.

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, ...
  • D. Bahdanau, "Neural machine translation by jointly learning to align ...
  • S. Wang, B. Z. Li, M. Khabsa, H. Fang, and ...
  • P. Shaw, J. Uszkoreit, and A. Vaswani, "Self-attention with relative ...
  • Z. Dai, Z. Yang, Y. Yang, J. Carbonell, Q. V. ...
  • R. Child, S. Gray, A. Radford, and I. Sutskever, "Generating ...
  • J. Devlin, M. Chang, K. Lee, and K. Toutanova, "Bert: ...
  • A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, ...
  • A. Katharopoulos, A. Vyas, N. Pappas, and F. Fleuret, "Transformers ...
  • M. Zaheer, G. Guruganesh, A. Dubey, J. Ainslie, C. Alberti, ...
  • W. Fedus, B. Zoph, and N. Shazeer, "Switch transformers: Scaling ...
  • J. Ho, N. Kalchbrenner, D. Weissenborn, and T. Salimans, "Axial ...
  • J. Fu, J. Liu, H. Tian, Y. Li, Y. Bao, ...
  • A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, ...
  • Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, ...
  • H. Wu, B. Xiao, N. Codella, and M. Liu, "Cvt: ...
  • E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. ...
  • Z. Peng, W. Huang, S. Gu, L. Xie, Y. Wang, ...
  • L. Meng, H. Li, B. Chen, S. Lan, Z. Wu, ...
  • H. Fan, B. Xiong, K. Mangalam, Y. Li, Z. Yan, ...
  • H. Lin, X. Cheng, X. Wu, F. Yang, D. Shen, ...
  • Y. H. Tsai, S. Bai, P. P. Liang, J. Z. ...
  • T. Munkhdalai, M. Faruqui, and S. Gopal, "Leave no context ...
  • J. Lee, Y. Lee, J. Kim, A. R. Kosiorek, S. ...
  • J. Su, Y. Lu, S. Pan, A. Murtadha, B. Wen, ...
  • D. Heo, and H. Choi, "Generalized Probabilistic Attention Mechanism in ...
  • R. Zhang, Y. Zou, and J. Ma, "Hyper-SAGNN: a self-attention ...
  • J. Zhang, X. Shi, J. Xie, H. Ma, I. King, ...
  • K. Choromanski, V. Likhosherstov, D. Dohan, and X. Song, "Rethinking ...
  • N. Kitaev, Ł. Kaiser, and A. Levskaya, "Reformer: The efficient ...
  • H. Chen, G. Ding, X. Liu, Z. Lin, J. Liu, ...
  • A. Mohtashami, and M. Jaggi, "Landmark attention: Random-access infinite context ...
  • R. Sanovar, S. Bharadwaj, R. S. Amant, V. Rühle, and ...
  • A. Roy, M. Saffar, A. Vaswani, and D. Grangier, "Efficient ...
  • I. Beltagy, M. E. Peters, and A. Cohan, "Longformer: The ...
  • C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, ...
  • F. Wu, A. Fan, A. Baevski, Y. N. Dauphin, and ...
  • C. Wu, Y. Li, K. Mangalam, H. Fan, B. Xiong, ...
  • Y. Li, C. Wu, H. Fan, K. Mangalam, B. Xiong, ...
  • Y. Tay, D. Bahri, D. Metzler, D. Juan, Z. Zhao, ...
  • P. Xu, X. Zhu, and D. A. Clifton, "Multimodal learning ...
  • نمایش کامل مراجع