Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks

سال انتشار: 1404
نوع سند: مقاله ژورنالی
زبان: انگلیسی
مشاهده: 51

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

JR_IJWR-8-2_002

تاریخ نمایه سازی: 16 خرداد 1404

چکیده مقاله:

Objective: This review explores the trustworthiness of multimodal artificial intelligence (AI) systems, specifically focusing on vision-language tasks. It addresses critical challenges related to fairness, transparency, and ethical implications in these systems, providing a comparative analysis of key tasks such as Visual Question Answering (VQA), image captioning, and visual dialogue. Background: Multimodal models, particularly vision-language models, enhance artificial intelligence (AI) capabilities by integrating visual and textual data, mimicking human learning processes. Despite significant advancements, the trustworthiness of these models remains a crucial concern, particularly as AI systems increasingly confront issues regarding fairness, transparency, and ethics. Methods: This review examines research conducted from ۲۰۱۷ to ۲۰۲۴, focusing on forenamed core vision-language tasks. It employs a comparative approach to analyze these tasks through the lens of trustworthiness, underlining fairness, explainability, and ethics. This study synthesizes findings from recent literature to identify trends, challenges, and state-of-the-art solutions. Results: Several key findings were highlighted. Transparency: The explainability of vision language tasks is important for user trust. Techniques, such as attention maps and gradient-based methods, have successfully addressed this issue. Fairness: Bias mitigation in VQA and visual dialogue systems is essential for ensuring unbiased outcomes across diverse demographic groups. Ethical Implications: Addressing biases in multilingual models and ensuring ethical data handling is critical for the responsible deployment of vision-language systems. Conclusion: This study underscores the importance of integrating fairness, transparency, and ethical considerations in developing vision-language models within a unified framework.

نویسندگان

Mohammad Saleh

Department of Computer Engineering, University of Science and Culture, Tehran, Iran

Azadeh Tabatabaei

Department of Computer Engineering, University of Science and Culture, Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :
  • Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and ...
  • Zhang, Y. Deng, B. Liu, S. J. Pan, and L. ...
  • Bastian et al., “On the Localization of Ultrasound Image Slices ...
  • Shakuri and A. Rezvanian, “An Efficient Approach to Detecting Lung ...
  • Chawla et al., “Veagle: Advancements in Multimodal Representation Learning,” arXiv ...
  • M., Ashutosh and B. Santhi. “Automated Image Captioning Using Multimodal ...
  • Ni, T. Young, V. Pandelea, F. Xue, and E. Cambria, ...
  • Carolan, L. Fennelly, and A. F. Smeaton, “A Review of ...
  • Adewumi, L. Alkhaled, N. Gurung, G. van Boven, and I. ...
  • Li, X. Wu, H. Du, H. Nghiem, and G. Shi, ...
  • Chu, Z. Wang and W. Zhang, “Fairness in Large Language ...
  • Yan, W. Zeng, Y. Sun, W. Tan, X. Zhou, and ...
  • Lee, Y. Bang, H. Lovenia, S. Cahyawijaya, W. Dai, and ...
  • Ali, M. Kleindessner, F. Wenzel, K. Budhathoki, V. Cevher, and ...
  • Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable AI: A Review ...
  • R. Hoffman, S. T. Mueller, G. Klein, and J. Litman, ...
  • X. Liang et al., “A Comprehensive Survey and Guide to ...
  • Kayser and L. Shams, “Multisensory Causal Inference in the Brain,” ...
  • Parcalabescu, N. Trost, and A. Frank, “What is Multimodality?,” arXiv: ...
  • Tsankova et al., “Facial and Vocal Cues in Perceptions of ...
  • Tsankova et al., “The multi-modal nature of trustworthiness perception - ...
  • Antol et al., “VQA: Visual Question Answering,” ۲۰۱۵ IEEE International ...
  • Marino, M. Rastegari, A. Farhadi, and R. Mottaghi, “OK-VQA: A ...
  • Teney, Q. Wu, and A. van den Hengel, “Visual Question ...
  • Guo, W. Chen, Y. Sun, J. Xu, and B. Ai, ...
  • Xiao et al., “VideoQA in the Era of LLMs: An ...
  • Li and M.-F. Moens, “Dynamic Key-Value Memory Enhanced Multi-Step Graph ...
  • Xenos, T. Stafylakis, I. Patras, and G. Tzimiropoulos, “A Simple ...
  • Chen, X. Chen, S. Xu, and B. Xu, “Improving Cross-Modal ...
  • F. Ishmam, Md. S. H. Shovon, M. F. Mridha, and ...
  • Yang, K. Tang, J. Yang, and L. J. Li, “Dense ...
  • Islam, A. Dash, A. Seum, A. H. Raj, T. Hossain, ...
  • Chen, A. Gholami, M. Niessner, and A. X. Chang, “Scan۲Cap: ...
  • Salaberria, G. Azkune, O. Lopez de Lacalle, A. Soroa, and ...
  • Anderson et al., “Bottom-Up and Top-Down Attention for Image Captioning ...
  • Y. Zakari, J. W. Owusu, H. Wang, K. Qin, Z. ...
  • Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. ...
  • Zhou, A. Suhr, and Y. Artzi, “Visual Reasoning with Natural ...
  • Li et al., “Trustworthy AI: From Principles to Practices,” ACM ...
  • ZadZiabari and A. Tabatabaei, “Ethics and Regulations in Generative AI,” ...
  • Felzmann, E. Fosch-Villaronga, C. Lutz, and A. Tamò-Larrieux, “Towards Transparency ...
  • Beddiar and M. Oussalah, “Explainability in medical image captioning,” in ...
  • Goebel et al., “Explainable AI: The New ۴۲?,” in Machine ...
  • Weber, K. V. Carl, and O. Hinz, “Applications of Explainable ...
  • Canepa, S. Singh, and A. Sowmya, “Visual Question Answering in ...
  • Liao, A. van den Hengel, and J. W. Verjans, “Chapter ...
  • Borys et al., “Explainable AI in medical imaging: An overview ...
  • Aljohani, J. Hou, S. Kommu, and X.Wang, “A Comprehensive Survey ...
  • Maruthi, S. B. Dodda, R. R. Yellu, P. Thuniki, and ...
  • Pradeep, M. Caro-Martínez, and A. Wijekoon, “A practical exploration of ...
  • Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping ...
  • Achiam et al., “GPT-۴ Technical Report,” arXiv: arXiv:۲۳۰۳.۰۸۷۷۴, Mar. ۰۴, ...
  • Elguendouze, A. Hafiane, M. C. P. de Souto, and A. ...
  • Xu et al., “Show, Attend and Tell: Neural Image Caption ...
  • Eiter, T. Geibinger, N. Higuera, and J. Oetsch, “A Logic-based ...
  • Xue, S. Qian, and C. Xu, “Variational Causal Inference Network ...
  • Wu, T. Yu, and S. Li, “Deconfounded and Explainable Interactive ...
  • Verheyen, J. Botoko Ekila, J. Nevens, P. Van Eecke, and ...
  • Al-Shouha and G. Szűcs, “PIC-XAI: Post-hoc Image Captioning Explanation using ...
  • Li, L. Niu, and L. Zhang, “Knowledge Proxy Intervention for ...
  • Shen, H. Zhan, X. Shen, H. Chen, X. Zhao, and ...
  • Feustel, N. Rach, W. Minker, and S. Ultes, “Enhancing Model ...
  • Danry, P. Pataranutaporn, Y. Mao, and P. Maes, “Don’t Just ...
  • Ferrara, “Fairness and Bias in Artificial Intelligence: A Brief Survey ...
  • Buolamwini and T. Gebru, “Gender Shades: Intersectional Accuracy Disparities in ...
  • Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. ...
  • Hardt, E. Price, E. Price, and N. Srebro, “Equality of ...
  • Park, S. Hwang, J. Hong, and H. Byun, “Fair-VQA: Fairness-Aware ...
  • Zou and Q. Xie, “A Survey on VQA: Datasets and ...
  • Sicilia and M. Alikhani, “Learning to Generate Equitable Text in ...
  • Wu, Y. Wang, H.-T. Wu, Z. Tao, and Y. Fang, ...
  • W. Nathim, N. A. Hameed, S. A. Salih, N. A. ...
  • Luo et al., “FairCLIP: Harnessing Fairness in Vision-Language Learning,” presented ...
  • Hirota, Y. Nakashima, and N. Garcia, “Gender and Racial Bias ...
  • Qiu, Z. Y. Dou, T. Wang, A. Celikyilmaz, and N. ...
  • Zhao, A. Wang, and O. Russakovsky, “Understanding and Evaluating Racial ...
  • Yun and J. Kim, “CIC: A framework for Culturally-aware Image ...
  • Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh, ...
  • Chen, X. Yan, J. Xiao, H. Zhang, S. Pu, and ...
  • Mildner et al., “Listening to the Voices: Describing Ethical Caveats ...
  • Henderson et al., “Ethical Challenges in Data-Driven Dialogue Systems,” In ...
  • T. Fischer, S. D. Hirsbrunner, W. Jentner, M. Miller, D. ...
  • Dyoub, S. Costantini, I. Letteri, and F. A. Lisi, “A ...
  • Wu, P. Wang, X. Wang, X. He, and W. Zhu, ...
  • Cornille, K. Laenen, and M.-F. Moens, “Critical Analysis of Deconfounded ...
  • Cambria, L. Malandri, F. Mercorio, M. Mezzanzanica, and N. Nobani, ...
  • Radford et al., “Learning Transferable Visual Models From Natural Language ...
  • Zhou, D. Inkpen, and B. Kantarci, “Evaluating and Mitigating Gender ...
  • Abdollahi et al., “GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ...
  • Gusain and S. Jha, “A Visual Dialogue: Practising Hospitality through ...
  • Brandl, E. Bugliarello, and I. Chalkidis, “On the Interplay between ...
  • Dehdashtian et al., “Fairness and Bias Mitigation in Computer Vision: ...
  • Zhong et al., “Enhancing Multimodal Large Language Models with Multi-instance ...
  • Verma, M.-H. Van, and X. Wu, “Beyond Human Vision: The ...
  • Liu et al., “A Survey on Hallucination in Large Vision-Language ...
  • H. Huang, J. L. Li, C. P. Chen, M. C. ...
  • نمایش کامل مراجع