From Algorithms to Academia: An Endeavor to Benchmark AI-Generated Scientific Papers against Human Standards

سال انتشار: 1404
محل انتشار: مجله استخوان و جراحی عمومی، دوره: 13، شماره: 4
کد COI اختصاصی: JR_TABO-13-4_005
زبان مقاله: انگلیسی
تعداد مشاهده: 48

نویسندگان

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Nour Nassour

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

John Kwon

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Soheil Ashkani-Esfahani

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

Mitchel Harris

Foot & Ankle Research and Innovation Lab (FARIL), Department of Orthopaedic Surgery, Massachusetts General Hospital, Harvard Medical School, Boston, MA, USA

چکیده

Objectives: The aim of this study is to quantitatively investigate the accuracy of text generated by AI large language models while comparing their readability and likelihood of being accepted to a scientific compared to human-authored papers on the same topics.Methods: The study consisted of two papers written by ChatGPT, two papers written by Assistant by scite, and two papers written by humans. A total of six independent reviewers were blinded to the authorship of each paper and assigned a grade to each subsection on a scale of ۱ to ۴. Additionally, each reviewer was asked to guess if the paper was written by a human or AI and explain their reasoning. The study authors also graded each AI-generated paper based on factual accuracy of the claims and citations.Results: The human-written calcaneus fracture paper received the highest score of a ۳.۷۰/۴, followed by Assistantwritten calcaneus fracture paper (۳.۰۲/۴), human-written ankle osteoarthritis paper (۲.۹۸/۴), ChatGPT calcaneus fracture (۲.۸۹/۴), ChatGPT Ankle Osteoarthritis (۲.۸۷/۴), and Assistant Ankle Osteoarthritis (۲.۷۸/۴). The human calcaneus fracture paper received a statistically significant higher rating than the ChatGPT calcaneus fracture paper (P = ۰.۰۲۸) and the Assistant calcaneus fracture paper (P = ۰.۰۴۳). The ChatGPT osteoarthritis review showed ۱۰۰% factual accuracy, the ChatGPT calcaneus fracture review was ۹۷.۴۶% factually accurate, the Assistant calcaneus fracture was ۹۵.۵۶% accurate, and the Assistant ankle osteoarthritis was ۹۴.۹۸% accurate. Regarding citations, the ChatGPT ankle osteoarthritis paper was ۹۰% accurate, the ChatGPT calcaneus fracture was ۶۹.۲۳% accurate, the Assistant ankle osteoarthritis was ۳۵.۱۴% accurate, and the Assistant calcaneus fracture was ۳۹.۶۸% accurate. Conclusion: Through this paper we emphasize that while AI holds the promise of enhancing knowledge sharing, it must be used responsibly and in conjunction with comprehensive fact-checking procedures to maintain the integrity of the scientific discourse. Level of evidence: III

کلیدواژه ها

Artificial intelligence, ChatGPT, Large Language Models, Natural Language Processing, Prompt Engineering

اطلاعات بیشتر در مورد COI

COI مخفف عبارت CIVILICA Object Identifier به معنی شناسه سیویلیکا برای اسناد است. COI کدی است که مطابق محل انتشار، به مقالات کنفرانسها و ژورنالهای داخل کشور به هنگام نمایه سازی بر روی پایگاه استنادی سیویلیکا اختصاص می یابد.

کد COI به مفهوم کد ملی اسناد نمایه شده در سیویلیکا است و کدی یکتا و ثابت است و به همین دلیل همواره قابلیت استناد و پیگیری دارد.