An Overview of Multimodal Natural Language Processing Based on Artificial Intelligence: From Text Translation to Subject-Specific Analysis
سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 9
فایل این مقاله در 7 صفحه با فرمت PDF قابل دریافت می باشد
- صدور گواهی نمایه سازی
- من نویسنده این مقاله هستم
استخراج به نرم افزارهای پژوهشی:
شناسه ملی سند علمی:
CICTC04_062
تاریخ نمایه سازی: 21 بهمن 1404
چکیده مقاله:
In this review article, given that Multimodal Natural Language Processing (NLP) has made remarkable progress in the ability to convert multimedia inputs (text, image, audio) into each other, new architectures and solutions in the field of Multimodal NLP are examined; which include such things as translating text, audio, and image into each other, recognizing and generating image captions, and analyzing surrounding data. First, the architectures of convolutional neural networks, transformers, and various multimodal coding models are analyzed; then the advantages, challenges, and future research efforts are stated.
کلیدواژه ها:
نویسندگان
Ammar Arab
Student of Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran
Ahmad Sharif
Department of Computer engineering, Qo. C., Islamic Azad University, Qom, Iran