A comparative study of biological images generated by selected generative artificial intelligence

Hassan Shojaee-Mend; Reza Mohebbati; Elina Saffarzadeh

A comparative study of biological images generated by selected generative artificial intelligence

محل انتشار: دومین کنگره بین المللی هوش مصنوعی در علوم پزشکی

سال انتشار: 1404

نوع سند: مقاله کنفرانسی

زبان: انگلیسی

مشاهده: 65

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

صدور گواهی نمایه سازی
من نویسنده این مقاله هستم

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

https://civilica.com/doc/2311105

شناسه ملی سند علمی:

AIMS02_386

تاریخ نمایه سازی: 29 تیر 1404

چکیده مقاله:

Background and Aims: Generative AI models for text-to-image (T۲I) have found wide application in scientific and medical fields in recent years due to their ability to produce diverse and realistic images from text descriptions. These models can be useful tools for education and research in the production of biological images such as cellular structures, organs and biological processes. However, it is very necessary to evaluate them to ensure their scientific validity. This study aimed to evaluate and compare the accuracy of text-to-image AI models in generating biological images. Methods: Four generative AI models DALL-E ۳ (accessed via the ChatGPT), Grok, Gemini, Stable Diffusion were selected based on accessibility and technical diversity. A standard set of text descriptions was designed for three biological subjects (kidney, brain, eye) at three levels of complexity (low, medium, high). Nine images were generated by each AI model. Three experts scored the images from ۰-۵ points. Data were analyzed using the Friedman test (to compare models), the Kruskal-Wallis’s test (to compare complexity levels), and the intraclass correlation coefficient (ICC) for expert agreement. Results: The mean scores of the models were Gemini (۲.۹۳), Grok (۲.۴۴), DALL-E ۳ (۲.۲۲), and Stable Diffusion (۱.۵۹), respectively. The Friedman test (statistics: ۲۷.۵۳, p ۰.۰۵) showed that the difference between the models was statistically significant. Gemini performed better on the brain (۴.۵۶) and Grok on the high complexity level (۳.۰۰), while Stable Diffusion performed very poorly on the kidney (۰.۰۰). The Kruskal-Wallis’s test (statistics: ۴.۲۹, p = ۰.۱۱۷) did not show a significant difference between the complexity levels, although the mean scores at the low level (۳.۲۵) were higher than those at the medium (۱.۷۸) and high (۱.۶۹) levels. The ICC of ۰.۹۸۹ confirmed a very high level of agreement among the experts. Conclusion: In biological image generation, Gemini performed best. Grok and DALL-E۳ performed moderately, and Stable Diffusion performed poorly. The high level of agreement among the experts ensured the validity of the evaluations. These findings can provide guidance for

نویسندگان

Hassan Shojaee-Mend

Infectious Diseases Research Center, Department of General Courses, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran

Reza Mohebbati

Department of Physiology, Faculty of Medicine, Gonabad University of Medical Sciences, Gonabad, Iran

Elina Saffarzadeh

Faculty of Medicine, Gonabad University of Medical Science, Gonabad, Iran