Intelligent Diagnosis in Veterinary Radiology: Evaluating the Performance of a Large Language Model

سال انتشار: 1404
نوع سند: مقاله کنفرانسی
زبان: انگلیسی
مشاهده: 1

متن کامل این مقاله منتشر نشده است و فقط به صورت چکیده یا چکیده مبسوط در پایگاه موجود می باشد.
توضیح: معمولا کلیه مقالاتی که کمتر از ۵ صفحه باشند در پایگاه سیویلیکا اصل مقاله (فول تکست) محسوب نمی شوند و فقط کاربران عضو بدون کسر اعتبار می توانند فایل آنها را دریافت نمایند.

استخراج به نرم افزارهای پژوهشی:

لینک ثابت به این مقاله:

شناسه ملی سند علمی:

IVSC13_0743

تاریخ نمایه سازی: 3 اسفند 1404

چکیده مقاله:

Background: Intelligent diagnosis involves using cutting-edge technologies like machine learning and natural language processing to analyze patient data, such as laboratory and radiologic data, medical history and symptoms, to deliver accurate diagnoses. Unlike traditional machine learning methods, Transformer-based large language models (LLMs), such as ChatGPT‑۵, are pre-trained on extensive datasets and can perform various natural language processing tasks, including medical diagnosis. However, studies evaluating LLM performance in veterinary medicine remain limited. This study aimed to assess the diagnostic capability of ChatGPT‑۵ in clinical veterinary radiology cases, and across different anatomic regions. Methods: Patient signalment (age and breed), clinical signs, and radiographic findings were collected from ۶۰ publicly available canine and feline veterinary radiology case reports, each documenting abnormal radiographic findings. All radiographic studies had been interpreted by a board-certified veterinary radiologist. Cases were evenly categorized by anatomical region: thorax (n=۲۰), abdomen (n=۲۰), and musculoskeletal system (n=۲۰). A separate session was used for each case to prevent influence from prior cases. The same prompt format, along with patient signalment, clinical signs, and radiographic report findings, was used for all cases to ensure consistency in model input. The model’s diagnostic outputs were recorded and compared with the published final diagnoses, which served as the gold standard, and the proportion of correct responses was calculated to estimate accuracy. Additionally, ۱۰ radiographic reports describing normal findings were provided to the model to assess its ability to differentiate between normal and abnormal studies, all of which were correctly identified as normal. Results: ChatGPT-۵ correctly diagnosed the primary condition in ۳۵ of ۶۰ cases, resulting in an overall diagnostic accuracy of ۵۸.۳%. Accuracy varied by anatomical region: ۴۵% (۹/۲۰) for thoracic cases, ۷۰% (۱۴/۲۰) for abdominal cases, and ۶۰% (۱۲/۲۰) for musculoskeletal cases. Accuracy also varied by species, with ۶۲% for canine cases (n=۴۸) and ۴۱% for feline cases (n=۱۲). The lower accuracy in feline cases may be due to the smaller number of cases and greater variability in feline presentations, and should be interpreted with caution. The correct responses were comprehensive and closely aligned with the published diagnoses. Most errors occurred in complex or multifactorial cases that required additional clinical context, such as different types of neoplasia.

نویسندگان

Seyedeh Yasaman Razavi Tousi

DVM Student, Faculty of Veterinary Medicine, Karaj Branch, Islamic Azad University, Alborz, Iran.