Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks

Mohammad Saleh; Azadeh Tabatabaei

Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks

محل انتشار: فصلنامه بین المللی وب پژوهی، دوره: 8، شماره: 2

سال انتشار: 1404

نوع سند: مقاله ژورنالی

زبان: انگلیسی

مشاهده: 51

فایل این مقاله در 14 صفحه با فرمت PDF قابل دریافت می باشد

دریافت فایل کامل مقاله

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2280467

شناسه ملی سند علمی:

JR_IJWR-8-2_002

تاریخ نمایه سازی: 16 خرداد 1404

چکیده مقاله:

Objective: This review explores the trustworthiness of multimodal artificial intelligence (AI) systems, specifically focusing on vision-language tasks. It addresses critical challenges related to fairness, transparency, and ethical implications in these systems, providing a comparative analysis of key tasks such as Visual Question Answering (VQA), image captioning, and visual dialogue. Background: Multimodal models, particularly vision-language models, enhance artificial intelligence (AI) capabilities by integrating visual and textual data, mimicking human learning processes. Despite significant advancements, the trustworthiness of these models remains a crucial concern, particularly as AI systems increasingly confront issues regarding fairness, transparency, and ethics. Methods: This review examines research conducted from ۲۰۱۷ to ۲۰۲۴, focusing on forenamed core vision-language tasks. It employs a comparative approach to analyze these tasks through the lens of trustworthiness, underlining fairness, explainability, and ethics. This study synthesizes findings from recent literature to identify trends, challenges, and state-of-the-art solutions. Results: Several key findings were highlighted. Transparency: The explainability of vision language tasks is important for user trust. Techniques, such as attention maps and gradient-based methods, have successfully addressed this issue. Fairness: Bias mitigation in VQA and visual dialogue systems is essential for ensuring unbiased outcomes across diverse demographic groups. Ethical Implications: Addressing biases in multilingual models and ensuring ethical data handling is critical for the responsible deployment of vision-language systems. Conclusion: This study underscores the importance of integrating fairness, transparency, and ethical considerations in developing vision-language models within a unified framework.

کلیدواژه ها:

VQA ، Ethical Implications ، Trustworthiness ، Debiasing ، Explainability ، Image Captioning ، Visual Dialogue

نویسندگان

Mohammad Saleh

Department of Computer Engineering, University of Science and Culture, Tehran, Iran

Azadeh Tabatabaei

Department of Computer Engineering, University of Science and Culture, Tehran, Iran

مراجع و منابع این مقاله:

لیست زیر مراجع و منابع استفاده شده در این مقاله را نمایش می دهد. این مراجع به صورت کاملا ماشینی و بر اساس هوش مصنوعی استخراج شده اند و لذا ممکن است دارای اشکالاتی باشند که به مرور زمان دقت استخراج این محتوا افزایش می یابد. مراجعی که مقالات مربوط به آنها در سیویلیکا نمایه شده و پیدا شده اند، به خود مقاله لینک شده اند :

Zhou, H. Palangi, L. Zhang, H. Hu, J. Corso, and ...
Zhang, Y. Deng, B. Liu, S. J. Pan, and L. ...
Bastian et al., “On the Localization of Ultrasound Image Slices ...
Shakuri and A. Rezvanian, “An Efficient Approach to Detecting Lung ...
Chawla et al., “Veagle: Advancements in Multimodal Representation Learning,” arXiv ...
M., Ashutosh and B. Santhi. “Automated Image Captioning Using Multimodal ...
Ni, T. Young, V. Pandelea, F. Xue, and E. Cambria, ...
Carolan, L. Fennelly, and A. F. Smeaton, “A Review of ...
Adewumi, L. Alkhaled, N. Gurung, G. van Boven, and I. ...
Li, X. Wu, H. Du, H. Nghiem, and G. Shi, ...
Chu, Z. Wang and W. Zhang, “Fairness in Large Language ...
Yan, W. Zeng, Y. Sun, W. Tan, X. Zhou, and ...
Lee, Y. Bang, H. Lovenia, S. Cahyawijaya, W. Dai, and ...
Ali, M. Kleindessner, F. Wenzel, K. Budhathoki, V. Cevher, and ...
Linardatos, V. Papastefanopoulos, and S. Kotsiantis, “Explainable AI: A Review ...
R. Hoffman, S. T. Mueller, G. Klein, and J. Litman, ...
X. Liang et al., “A Comprehensive Survey and Guide to ...
Kayser and L. Shams, “Multisensory Causal Inference in the Brain,” ...
Parcalabescu, N. Trost, and A. Frank, “What is Multimodality?,” arXiv: ...
Tsankova et al., “Facial and Vocal Cues in Perceptions of ...
Tsankova et al., “The multi-modal nature of trustworthiness perception - ...
Antol et al., “VQA: Visual Question Answering,” ۲۰۱۵ IEEE International ...
Marino, M. Rastegari, A. Farhadi, and R. Mottaghi, “OK-VQA: A ...
Teney, Q. Wu, and A. van den Hengel, “Visual Question ...
Guo, W. Chen, Y. Sun, J. Xu, and B. Ai, ...
Xiao et al., “VideoQA in the Era of LLMs: An ...
Li and M.-F. Moens, “Dynamic Key-Value Memory Enhanced Multi-Step Graph ...
Xenos, T. Stafylakis, I. Patras, and G. Tzimiropoulos, “A Simple ...
Chen, X. Chen, S. Xu, and B. Xu, “Improving Cross-Modal ...
F. Ishmam, Md. S. H. Shovon, M. F. Mridha, and ...
Yang, K. Tang, J. Yang, and L. J. Li, “Dense ...
Islam, A. Dash, A. Seum, A. H. Raj, T. Hossain, ...
Chen, A. Gholami, M. Niessner, and A. X. Chang, “Scan۲Cap: ...
Salaberria, G. Azkune, O. Lopez de Lacalle, A. Soroa, and ...
Anderson et al., “Bottom-Up and Top-Down Attention for Image Captioning ...
Y. Zakari, J. W. Owusu, H. Wang, K. Qin, Z. ...
Johnson, B. Hariharan, L. van der Maaten, L. Fei-Fei, C. ...
Zhou, A. Suhr, and Y. Artzi, “Visual Reasoning with Natural ...
Li et al., “Trustworthy AI: From Principles to Practices,” ACM ...
ZadZiabari and A. Tabatabaei, “Ethics and Regulations in Generative AI,” ...
Felzmann, E. Fosch-Villaronga, C. Lutz, and A. Tamò-Larrieux, “Towards Transparency ...
Beddiar and M. Oussalah, “Explainability in medical image captioning,” in ...
Goebel et al., “Explainable AI: The New ۴۲?,” in Machine ...
Weber, K. V. Carl, and O. Hinz, “Applications of Explainable ...
Canepa, S. Singh, and A. Sowmya, “Visual Question Answering in ...
Liao, A. van den Hengel, and J. W. Verjans, “Chapter ...
Borys et al., “Explainable AI in medical imaging: An overview ...
Aljohani, J. Hou, S. Kommu, and X.Wang, “A Comprehensive Survey ...
Maruthi, S. B. Dodda, R. R. Yellu, P. Thuniki, and ...
Pradeep, M. Caro-Martínez, and A. Wijekoon, “A practical exploration of ...
Li, D. Li, C. Xiong, and S. Hoi, “BLIP: Bootstrapping ...
Achiam et al., “GPT-۴ Technical Report,” arXiv: arXiv:۲۳۰۳.۰۸۷۷۴, Mar. ۰۴, ...
Elguendouze, A. Hafiane, M. C. P. de Souto, and A. ...
Xu et al., “Show, Attend and Tell: Neural Image Caption ...
Eiter, T. Geibinger, N. Higuera, and J. Oetsch, “A Logic-based ...
Xue, S. Qian, and C. Xu, “Variational Causal Inference Network ...
Wu, T. Yu, and S. Li, “Deconfounded and Explainable Interactive ...
Verheyen, J. Botoko Ekila, J. Nevens, P. Van Eecke, and ...
Al-Shouha and G. Szűcs, “PIC-XAI: Post-hoc Image Captioning Explanation using ...
Li, L. Niu, and L. Zhang, “Knowledge Proxy Intervention for ...
Shen, H. Zhan, X. Shen, H. Chen, X. Zhao, and ...
Feustel, N. Rach, W. Minker, and S. Ultes, “Enhancing Model ...
Danry, P. Pataranutaporn, Y. Mao, and P. Maes, “Don’t Just ...
Ferrara, “Fairness and Bias in Artificial Intelligence: A Brief Survey ...
Buolamwini and T. Gebru, “Gender Shades: Intersectional Accuracy Disparities in ...
Calmon, D. Wei, B. Vinzamuri, K. Natesan Ramamurthy, and K. ...
Hardt, E. Price, E. Price, and N. Srebro, “Equality of ...
Park, S. Hwang, J. Hong, and H. Byun, “Fair-VQA: Fairness-Aware ...
Zou and Q. Xie, “A Survey on VQA: Datasets and ...
Sicilia and M. Alikhani, “Learning to Generate Equitable Text in ...
Wu, Y. Wang, H.-T. Wu, Z. Tao, and Y. Fang, ...
W. Nathim, N. A. Hameed, S. A. Salih, N. A. ...
Luo et al., “FairCLIP: Harnessing Fairness in Vision-Language Learning,” presented ...
Hirota, Y. Nakashima, and N. Garcia, “Gender and Racial Bias ...
Qiu, Z. Y. Dou, T. Wang, A. Celikyilmaz, and N. ...
Zhao, A. Wang, and O. Russakovsky, “Understanding and Evaluating Racial ...
Yun and J. Kim, “CIC: A framework for Culturally-aware Image ...
Goyal, T. Khot, D. Summers-Stay, D. Batra, and D. Parikh, ...
Chen, X. Yan, J. Xiao, H. Zhang, S. Pu, and ...
Mildner et al., “Listening to the Voices: Describing Ethical Caveats ...
Henderson et al., “Ethical Challenges in Data-Driven Dialogue Systems,” In ...
T. Fischer, S. D. Hirsbrunner, W. Jentner, M. Miller, D. ...
Dyoub, S. Costantini, I. Letteri, and F. A. Lisi, “A ...
Wu, P. Wang, X. Wang, X. He, and W. Zhu, ...
Cornille, K. Laenen, and M.-F. Moens, “Critical Analysis of Deconfounded ...
Cambria, L. Malandri, F. Mercorio, M. Mezzanzanica, and N. Nobani, ...
Radford et al., “Learning Transferable Visual Models From Natural Language ...
Zhou, D. Inkpen, and B. Kantarci, “Evaluating and Mitigating Gender ...
Abdollahi et al., “GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language ...
Gusain and S. Jha, “A Visual Dialogue: Practising Hospitality through ...
Brandl, E. Bugliarello, and I. Chalkidis, “On the Interplay between ...
Dehdashtian et al., “Fairness and Bias Mitigation in Computer Vision: ...
Zhong et al., “Enhancing Multimodal Large Language Models with Multi-instance ...
Verma, M.-H. Van, and X. Wu, “Beyond Human Vision: The ...
Liu et al., “A Survey on Hallucination in Large Vision-Language ...
H. Huang, J. L. Li, C. P. Chen, M. C. ...

نمایش کامل مراجع