An Overview of Multimodal Natural Language Processing Based on Artificial Intelligence: From Text Translation to Subject-Specific Analysis

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 9

فایل این مقاله در 7 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

CICTC04_062

تاریخ نمایه سازی: 21 بهمن 1404

چکیده مقاله:

In this review article, given that Multimodal Natural Language Processing (NLP) has made remarkable progress in the ability to convert multimedia inputs (text, image, audio) into each other, new architectures and solutions in the field of Multimodal NLP are examined; which include such things as translating text, audio, and image into each other, recognizing and generating image captions, and analyzing surrounding data. First, the architectures of convolutional neural networks, transformers, and various multimodal coding models are analyzed; then the advantages, challenges, and future research efforts are stated.

نویسندگان

Ammar Arab

Student of Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran

Ahmad Sharif

Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran